How can I extract the name of all persons in Wikidata with Python?

Question

I would like to extract (all distinct) names of all persons, i.e named entities that are human, in Wikidata with Python. I have tried different libraries (qwikidata, mwikidata), different get requests and Wikidata's SPARQL Service itself. After a while I understood that a general query like this:

SELECT ?person ?personLabel

WHERE {
    ?person wdt:P31 wd:Q5 .
    ?person rdfs:label ?personLabel. FILTER( LANG(?personLabel)="de, en" )
}

is too huge for the public API. Then I added a combination of limit and offset at the end of the query, e.g.:

ORDER BY ASC(?personLabel)

LIMIT 10000 OFFSET 10000

But no matter what I try I get either a TimeOutError (wikidata service) or json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (python)

One idea is to generate multiple datasets with the biological sex property (P21), but for male and female the same problems persists.

Help is much appreciated!

pagination in SPARQL is slow because not only `order by` is slow, `offset` is more complicated then in SQL. There are `10 016 353` persons in Wikidata (and that's just the direct assertions) - you won't make it via the public SPARQL endpoint. It is a shared service. I'd load it into a you own local triple store, or just use command line tools like `awk` and `sed`. — UninformedUser, Jul 15 '22 at 17:10
The alternative would be to use the QLever endpoint, which is way faster than the Blazegraph backend of the public Wikidata endpoint: https://qlever.cs.uni-freiburg.de/wikidata — UninformedUser, Jul 15 '22 at 17:11
by the way, that filter expression is wrong syntax: ` FILTER( LANG(?personLabel)="en, de" )` - it does not expect a comma separated list of lang tags, no idea from where you have this. If you want multiple filters as a logical or, then you have to use `||` and repeat the filter expression: `FILTER(expr1 || expr2)` — UninformedUser, Jul 15 '22 at 17:14

How can I extract the name of all persons in Wikidata with Python?

0 Answers0