YAGO 4.5 is the latest version of the YAGO knowledge base. It is based on Wikidata — the largest public general-purpose knowledge base. YAGO refines the data as follows:
- All entity identifiers and property identifiers are human-readable.
- The top-level classes come from schema.org — a standard repertoire of classes and properties maintained by Google and others. The lower level classes are a careful selection of the Wikidata taxonomy.
- The properties come from schema.org.
- YAGO 4.5 contains semantic constraints in the form of SHACL. These constraints keep the data clean, and allow for logical reasoning on YAGO.
YAGO is thus a simplified, cleaned, and “reasonable” version of Wikidata. It contains 49 million entities and 109 million facts.
If you use YAGO 4.5 for scientific purposes, please cite our paper:
Fabian M. Suchanek, Mehwish Alam, Thomas Bonald, Pierre-Henri Paris, Jules Soria:
Integrating the Wikidata Taxonomy into YAGO
Arxiv 2308.11884, 2023
How to use YAGO
YAGO is an RDFS knowledge base. It is a collection of facts, each of which consists of a subject, a predicate, and an object — as in
yago:Elvis_Presley rdf:type schema:Person.
The facts come from Wikidata, and the predicates have been mapped manually to the predicates of schema.org. Facts whose predicates could not be mapped were omitted. All predicates, all classes, and most entities have human-readable names. YAGO entities are mapped with
owl:sameAs to Wikidata.
YAGO comes with SHACL constraints that specify the disjointness of certain classes, as well as the domains, ranges, and cardinalities of relations. Please find a detailed description of the upper taxonomy as well as our design document here.
The YAGO 4.5 knowledge base consists of the following set of Turtle files:
- Schema: The upper taxonomy, constraints, and property definitions in SHACL.
- Taxonomy: The full taxonomy of classes.
- Facts: All facts about entities that have an English Wikipedia page.
- Facts beyond Wikipedia: All facts about entities that do not have an English Wikipedia page.
- Meta: The fact annotations (“facts about facts”) in RDF*.
YAGO 4.5 can be downloaded here
If you are just interested in the data of YAGO, there is no need to use the present code repository. You can download data of YAGO above.
The source code of YAGO is a Python project that ingests facts from Wikidata, and transforms them into YAGO. If you run the code yourself, you can add other sources or modify the generation of the knowledge base. The YAGO 4.5 source code is available at Github. It is licensed under the Creative Commons Attribution License.
YAGO can only be so large because it is based on other sources. We would like to thank