My question is related to a similar question/comment which unfortunately never received an answer.
Given a list of multiple Wikipedia pages, e.g.:
- https://en.wikipedia.org/wiki/Donald_Trump
- https://en.wikipedia.org/wiki/The_Matrix
- https://en.wikipedia.org/wiki/Tiger
- ...
how can I find out what type of entity these articles refer to. i.e. ideally I would want something on a higher level e.g. person, movie, animal etc.
My best guess so far was the Wikidata API using SPARQL to move back the instance_of
or subclass
tree. However, this did not lead to meaningful results.
SELECT ?lemma ?item ?itemLabel ?itemDescription ?instance ?instanceLabel ?subclassLabel WHERE {
VALUES ?lemma {
"Donald Trump"@en
"The Matrix"@en
"Tiger" @en
}
?sitelink schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>;
schema:name ?lemma.
?item wdt:P31* ?instance.
?item wdt:P279* ?subclass.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en,da,sv".}
}
The result can be seen here: https://w.wiki/ZmQ
One option would of course also be to look at the itemDescription
, but I'm afraid that this is too granular to build meaningful groups from larger lists and count frequencies later on.
Does anyone have a hint/idea on how to get more general entity categories? Maybe also from the mediawiki API?
Any input would be highly appreciated!