YAGO 4

YAGO 4 a version of the YAGO knowledge base that is based on Wikidata — the largest public general-purpose knowledge base. YAGO refines the data as follows:

  1. All entity identifiers and property identifiers are human-readable.
  2. The top-level classes come from schema.org — a standard repertoire of classes and properties maintained by Google and others, combined with bioschemas.org. The lower level classes are a selection of Wikidata classes.
  3. The properties come from schema.org.
  4. YAGO 4 contains semantic constraints in the form of SHACL. These constraints keep the data clean, and allow for logical reasoning on YAGO.

YAGO is thus a simplified, cleaned, and “reasonable” version of Wikidata. It contains more than 50 million entities and 2 billion facts.

If you use YAGO 4 for scientific purposes, please cite our paper:

Thomas Pellissier Tanon, Gerhard Weikum, Fabian M. Suchanek:
“YAGO 4: A Reason-able Knowledge Base”
Resource paper at the Extended Semantic Web Conference  (ESWC), 2020

How to use YAGO 4

YAGO 4 is an RDFS knowledge base. It is a collection of facts, each of which consists of a subject, a predicate, and an object — as in yago:Elvis_Presley rdf:type schema:Person.

YAGO puts each entity into at least one class. The classes form a taxonomy, where the higher classes are taken from schema.org (and bioschemas.org), and the lower classes are a selection of classes from Wikidata. The highest class is schema:Thing.

The facts come from Wikidata, and the predicates have been mapped manually to the predicates of schema.org. Facts whose predicates could not be mapped were omitted. All predicates, all classes, and most entities have human-readable names. For the entities and classes, these come from the corresponding English Wikipedia article. If no such article exists, the name comes from the English label of Wikidata, concatenated with the Wikidata identifier. If no such label exists, we use the Wikidata identifier. YAGO entities are mapped with owl:sameAs to Wikidata and DBpedia.

YAGO 4 comes with SHACL constraints that specify the disjointness of certain classes, as well as the domains, ranges, and cardinalities of relations.

Data

YAGO 4 licensed under a Creative Commons Attribution-ShareAlike by the YAGO team of Télécom Paris and the Max Planck Institute for Informatics. Some facts are imported from schema.org that releases its data under the same license.

The YAGO4 knowledge base distributed using a set of independent full-text N-Triples files, which together constitute the knowledge base. The files are the following:

  • Taxonomy: The full taxonomy of classes.
  • Full-types: All rdf:type relations.
  • Labels: All entity labels (rdfs:label, rdfs:comment and schema:alternateName).
  • Facts: The facts that are not labels.
  • Annotations: The fact annotations (“facts about facts”) in RDF*.
  • SameAs: The owl:sameAs links to Wikidata, DBpedia, and Freebase.
  • Schema: The schema.org classes and properties, in OWL 2 DL.
  • Shapes: The SHACL constraints used to generate YAGO 4.

YAGO4 is provided in three flavors:

  • Full: This flavor uses all data from Wikidata. Hence, it is an extremely large KB.
  • Wikipedia: We offer a smaller flavor of YAGO 4 that contains only the instances that have a Wikipedia article (in any language).
  • English Wikipedia: This is a restriction of the Wikipedia flavor to instances that have an English Wikipedia article.

The .ntx files are using RDF* (a.k.a. RDF star) N-Triples syntax. It can be parsed using Jena, RDF4J, N3.js RDF.rb, Blazegraph, AnzoGraph, Stardog, or GraphDB.

Code

If you are just interested in the data of YAGO, there is no need to use the present code repository. You can download data of YAGO above.

The source code of YAGO is a Rust project that ingests facts from Wikidata, and transforms them into YAGO. If you run the code yourself, you can add other sources or modify the generation of the knowledge base. The YAGO 4 source code is available at Github. It is licensed under the GNU General Public License, version 3 or later.

Acknowledgements

YAGO can only be so large because it is based on other sources. We would like to thank

  • the creators and contributors of Wikidata.
    Thank you for having and implementing such an ambitious vision of building a “Wikipedia for machines”, and thank you for keeping it open!
  • the team of Schema.org, who created the taxonomy and the properties that we use in YAGO 4.
  • the team of BioSchemas.org, who complemented schema.org with the missing bio-chemical taxonomy.