YAGO 4.5

YAGO 4.5 is the latest version of the YAGO knowledge base. It is based on Wikidata — the largest public general-purpose knowledge base. YAGO refines the data as follows:

  1. All entity identifiers and property identifiers are human-readable.
  2. The top-level classes come from schema.org — a standard repertoire of classes and properties maintained by Google and others. The lower level classes are a careful selection of the Wikidata taxonomy.
  3. The properties come from schema.org.
  4. YAGO 4.5 contains semantic constraints in the form of SHACL. These constraints keep the data clean, and allow for logical reasoning on YAGO.

YAGO is thus a simplified, cleaned, and “reasonable” version of Wikidata. It contains 49 million entities and 109 million facts. See above for getting started!

If you use YAGO 4.5 for scientific purposes, please cite our paper:

Fabian M. Suchanek, Mehwish Alam, Thomas Bonald, Lihu Chen, Pierre-Henri Paris, Jules Soria:
YAGO 4.5: A Large and Clean Knowledge Base with a Rich Taxonomy
Resource paper at the Conference on Research and Development in Information Retrieval (SIGIR), 2024

Data

YAGO 4.5 licensed under a Creative Commons Attribution-ShareAlike by the YAGO team of Télécom Paris. Some facts are imported from schema.org that releases its data under the same license.

The YAGO 4.5 knowledge base consists of the following set of Turtle files:

  • Schema: The upper taxonomy, constraints, and property definitions in SHACL (sample). The schema is explained here.
  • Taxonomy: The full taxonomy of classes. (sample)
  • Facts: All facts about entities that have an English Wikipedia page. (sample)
  • Facts beyond Wikipedia: All facts about entities that do not have an English Wikipedia page. (sample)
  • Meta: The fact annotations (“facts about facts”) in RDF*. (sample)

YAGO can then be loaded into any triple store, such as Jena, RDF4J, N3.js RDF.rb, Blazegraph, AnzoGraph, Stardog, GraphDB, or Qlever.

Code

If you are just interested in the data of YAGO, there is no need to use the present code repository. You can download data of YAGO above.

The source code of YAGO is a Python project that ingests facts from Wikidata, and transforms them into YAGO. If you run the code yourself, you can add other sources or modify the generation of the knowledge base. The YAGO 4.5 source code is available at Github. It is licensed under the Creative Commons Attribution License.

Acknowledgements

YAGO can only be so large because it is based on other sources. We would like to thank

  • the creators and contributors of Wikidata.
    Thank you for having and implementing such an ambitious vision of building a “Wikipedia for machines”, and thank you for keeping it open!
  • the team of Schema.org, who created the taxonomy and the properties that we use in YAGO 4.5.