Getting Started

What is YAGO?
What is so special about YAGO?
How does YAGO compare to other knowledge bases?
What are the logical constraints of YAGO?
What is the data model of YAGO?
How can I access YAGO?

What is YAGO?

YAGO is a knowledge base, i.e., a database with knowledge about the real world. YAGO contains both entities (such as movies, people, cities, countries, etc.) and relations between these entities (who played in which movie, which city is located in which country, etc.). All in all, YAGO contains more than 39 million entities and 167 million facts.

YAGO arranges its entities into classes: Elvis Presley belongs to the class of people, Paris belongs to the class of cities, and so on. These classes are arranged in a taxonomy: The class of cities is a subclass of the class of populated places, this class is a subclass of geographical locations, etc.

YAGO also defines which relations can hold between which entities: birthPlace, e.g., is a relation that can hold between a person and a place. The definition of these relations is called the schema.

What is so special about YAGO?

YAGO comes with a manually defined schema, which imposes logical constraints on the data. For example, people can be married only to people, and they can have at most one birth date. These constraints keep the data logically consistent and ensure its quality. YAGO can thus be considered a logically consistent subset of the much larger (but not consistent) Wikidata knowledge base.

How does YAGO compare to other knowledge bases?

YAGO positions itself as a large general knowledge base for facts about instances, with a taxonomy, manually defined properties, and logical constraints. Its key property is that it is a centrally controlled data source, which allows it to establish certain guarantees for the quality of its data.

- YAGO differs from DBpedia, because YAGO has a predefined schema, predefined and non-redundant relations, and logical constraints. The manually curated part of DBpedia has all of these, too, but contains only 5 million instances. YAGO contains 39 million.
- YAGO differs from Schema.org by having data about instances, and by being available under a Creative Commons Attribution license, which allows commercial usage (starting from YAGO 4.6).
- YAGO differs from ConceptNet by being about instances, and not about common sense knowledge.
- YAGO differs from BabelNet by being available (starting from YAGO 4.6) under a liberal Creative Commons Attribution license, which allows commercial usage.
- YAGO differs from Freebase by being an actively maintained project.
- YAGO differs from Wikidata by having human-readable identifiers, a clean top-level taxonomy, and enforced logical constraints.

For a more detailed discussion, see our scientific paper:

Fabian M. Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, Jules Soria:
YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy
Resource paper at the Conference on Research and Development in Information Retrieval (SIGIR), 2024

What are the logical constraints of YAGO?

Logical constraints are conditions that the data must fulfill. For example, a logical constraint can say that no entity can be at the same time a person and a place. These constraints serve to root out errors in the data, and establish the logical coherence of the knowledge base. The constraints also allow for making deductions: If someone asks whether Elvis is a place, then we can answer “no”, because we know he is a person. While this may sound trivial, such reasoning is not possible without the logical constraint. YAGO currently has the following logical constraints:

Disjointness: Place, person, and medical entities are disjoint classes
Functionality: several relations (such as birthPlace) can have at most one object
Domain and range: for every relation, we define which class the subject and the object belong to

What is the data model of YAGO?

YAGO is stored in the standard Resource Description Framework “RDF”. This means that YAGO is a set of facts, each of which consists of a subject, a predicate (also called “relation” or “property”) and an object — as in <Elvis> <birthPlace> <Tupelo>.

We use different vocabularies for the components of such a fact. For example, for the predicates, we use the relations that are defined by schema.org. Therefore, RDF requires that we prefix the predicates with schema:. This method allows us to refer to standard vocabulary without re-inventing the wheel.

For “facts about facts” (such as time stamps for facts or other types of annotations), we use the RDF* format.

How can I access YAGO?

There are several ways to access YAGO:

You can browse yourself through the knowledge base in our Web Interface
You can launch SPARQL queries in our SPARQL endpoint
You can programmatically send queries to our SPARQL endpoint
You can download data and load it into an RDF triple store. (e.g., BlazeGraph or Jena).
This is the preferred method if you plan to launch a larger number of queries!