YAGO 2

YAGO 2 is an improved version of the original YAGO knowledge base:

  • YAGO 2 is anchored in time and space. YAGO 2 attaches a temporal dimension and a spacial dimension to many of its facts and entities.
  • YAGO 2 is particularly suited for disambiguation purposes, as it contains a large number of names for entities. It also knows the gender of people.
  • As all major releases, the accuracy of YAGO 2 has been manually evaluated, proving an accuracy of 95% with respect to Wikipedia. Every relation is annotated with its confidence value.

If you use YAGO 2 for scientific purposes, please cite our paper:

Johannes Hoffart, Fabian M. Suchanek, Klaus Berberich, Gerhard Weikum:
YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia
Journal article in the Artificial Intelligence Journal, 2013

How to use YAGO 2

YAGO classifies each entity into a taxonomy of classes. Every entity is an instance of one or multiple classes. Every class (except the root class) is a subclass of one or multiple classes. This yields a hierarchy of classes — the taxonomy. The YAGO taxonomy is the backbone of the ontology, and is designed with much care and attention to correctness.

In YAGO (as in RDF), each fact consists of a subject, a predicate, and an object. Every fact can have a fact id. For example, the fact <Elvis_Presley> rdf:type <person> could have the fact identifier <id_42>. YAGO contains facts about these fact identifiers. For example, YAGO contains

  • <id_42> <occursSince> "1935-01-08"
  • <id_42> <occursUntil> "1977-08-16"
  • <id_42> <extractionSource> <http://en.wikipedia.org/Elvis_Presley>

These facts mean that Elvis was a person from the year 1935 to the year 1977, and that this fact was found in Wikipedia.

Data

The YAGO 2 knowledge base is licensed under a Creative Commons Attribution 3.0 License by the YAGO team of the Max-Planck Institute for Informatics. The version number of this data is 2.3.0. The data was extracted from the 2010-08-17 version of Wikipedia.

YAGO 2 is available in two data formats:

While the Turtle format is the W3C standard, the native format has the advantage that it includes the meta facts (“facts about facts”) and information about the textual context of entities (anchor texts etc.). The native format consists of a set of files, which together constitute the knowledge base. The files that start with an underscore are internal files that will be of little use to the end-user. The other files are named after the relation they contain. For example, the file “ismarriedto.tsv” is a TSV file that contains 3 columns: a fact id, a subject, and an object — meaning that the subject was married to the object, and that this fact has the given fact id.

Acknowledgements

YAGO can only be so large because it is based on other sources. We would like to thank

  • the numerous voluntary editors of Wikipedia. Thank you for giving mankind such a wonderful huge encyclopedia!
  • the team of Geonames. Thank you for creating this marvellous collection of geographical data, and thank you for providing this work for free!
  • the creators of WordNet. Thank you for organizing and analyzing the English language in such a diligent way and thank you for making your work available for free!
  • the Universal WordNet, which provided YAGO with multilingual labels for classes.