2

I am trying to extract birth and death data from Wikipedia. I have used DBpedia and Wikidata but in this particular instance the dates do not match Wikipedia.

This query https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&titles=Thomas_MacDermot&sites=enwiki returns a P569 with a date of 1870-01-01. DBpedia shows the same date.

The Wikipedia page https://en.wikipedia.org/wiki/Thomas_MacDermot shows a date of 26 June 1870.

Why this discrepancy? And can this date information be retrieved programmatically (i.e. not screen scraping) from Wikipedia itself?

Thank you!

fvlasie
  • 73
  • 10
  • Wikidata is meant to hold the factual data from Wikipedia, so that at one point in time, all Wikidata facts (dates etc) can be drawn from there. However, so far only some information is pulled from Wikidata, so you end up with differences in language versions and between Wikidata and Wikipedia for the time being. – Sirko Oct 21 '19 at 21:41

2 Answers2

2

Wikidata supplements Wikipedia's mostly unstructured content with independently input structured data, which may or may not also be seen on Wikipedia.

The DBpedia project translates much structured, and some unstructured, Wikipedia content to structured data.

DBpedia (more clearly, DBpedia Snapshot) data typically lags Wikipedia changes by months to years. Here, we see the dbo:birthDate for Thomas MacDermot as "1870-1-1".

DBpedia Live data typically lags Wikipedia changes by seconds to hours (with occasional longer delays due to software, hardware, and other issues in this evolving environment). Here, we see the dbo:birthDate for Thomas MacDermot as "1870-06-26"^^xsd:date.

You may find On the Mutually Beneficial Nature of DBpedia and Wikidata to be of interest.


P569 is described as "born on | birth date | birthdate| birth year | year of birth | birthyear | DOB" -- which is very confusing, to me. It seems that some entities are described with a full date in this property, while others are described only with a year in this property, and while this property is itself described as "never changing", the data Wikidata has stored may be incorrect, so the value in Wikidata may well change even if the fact doesn't.

TallTed
  • 9,069
  • 2
  • 22
  • 37
  • Thanks TallTed! This is exactly what I needed: `http://live.dbpedia.org/data/Thomas_MacDermot.json` – fvlasie Oct 21 '19 at 22:43
0

If you look at P570 you'll find the value "+1933-01-01T00:00:00Z" which matches the year of birth, but like P569 neither month nor day.
So I think maybe P569 and P570 aren't what you think they are (what is your reason to believe that P569 is the date of birth by the way?) but instead just represent year of birth/death and correspond to the 1870 births/1933 deaths categories on the Wikipedia page.

peer
  • 4,171
  • 8
  • 42
  • 73
  • Thanks for the reply. This thread says that p569 is the birth date data: https://stackoverflow.com/questions/12250580/parse-birth-and-death-dates-from-wikipedia – fvlasie Oct 21 '19 at 21:34
  • I would be happy to use a different number if there is a more appropriate one! :) – fvlasie Oct 21 '19 at 21:35
  • 1
    [P569](https://www.wikidata.org/wiki/Property:P569) is the date of birth. From the [main page of the item](https://www.wikidata.org/wiki/Q7792081) it looks like here only the year is added an no further day. The JSON export seems to add that for some reason. – Sirko Oct 21 '19 at 21:40