YAGO 1

This is the 2008 version of YAGO. It knows more than 2 million entities (like persons, organizations, cities, etc.). It knows 20 million facts about these entities. This version of YAGO includes the data extracted from the categories and infoboxes of Wikipedia, combined with the taxonomy of WordNet. YAGO 1 was manually evaluated, and found to have an accuracy of 95% with respect to the extraction source.

If you use YAGO 4 for scientific purposes, please cite our paper:

Fabian M. Suchanek, Gjergji Kasneci, Gerhard Weikum:
Yago - A Core of Semantic Knowledge
Full paper at the World Wide Web Conference (WWW)

How to use YAGO 1

YAGO classifies each entity into a taxonomy of classes. Every entity is an instance of one or multiple classes. Every class (except the root class) is a subclass of one or multiple classes. This yields a hierarchy of classes — the taxonomy. The YAGO taxonomy is the backbone of the ontology, and is designed with much care and attention to correctness.

Data

The YAGO1 knowledge base is licensed under the GNU Free Documentation License. The exact version number of this data is 1.0.0. The data was extracted from the 2008-10-01 version of Wikipedia.

YAGO 1 is available in two data formats:

While the Turtle format is the W3C standard, the native format has the advantage that it includes the meta facts (“facts about facts”). The native format is as follows:

  • The folder "facts" contains the main data. There is one subfolder for each relation. Each subfolder contains several files. Each of these files contains lines of the form <factId> TAB <arg1> TAB <arg2> TAB <confidence>. One such line means that <arg1> and <arg2> stand in the relation given by the subfolder, with an accumulated confidence <confidence>. Some relations concern “facts about facts”, i.e., they have factIds as subjects.
  • The folder “Entities” contains all entities. There are several files, each of which contains lines of the form <entity> TAB <isConcept> TAB <URL>, where <isConcept> is either true or false and tells whether the entity is a concept. <URL> is an URL that describes the entity (or null). This table is compiled as additional information from the relationships TYPE and DESCRIBES.

If a folder contains multiple files, all of these files are part of the knowledge base.

Acknowledgements

YAGO can only be so large because it is based on other sources. We would like to thank

  • the numerous voluntary editors of Wikipedia. Thank you for giving mankind such a wonderful huge encyclopedia!
  • the creators of WordNet. Thank you for organizing and analyzing the English language in such a diligent way and thank you for making your work available for free!