2

I want to query wikidata by free text or by category, to return entities who has a corresponding wikipedia page.

For each page (or for a selected page) I want to fetch all the linked wikidata entities who have a corresponding wikipedia article.

Note that:

  • for each wikipedia page and linked pages, I want to fetch the corresponding Wikidata Id
  • a linked wikidata entity may exist on other wikipedias, not necessarily in the queried language

(e.g. a page in French History is available in multiple languages; I may have linked pages in French only as well as others multiple languages).

I cannot figure out which wikidata APIs to corresponding ones in wikipedia, to query linked articles, and how to query linked pages that exist even beyond a selected language.

I looked at:

https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI

https://stackoverflow.com/a/57983365/305883

https://www.mediawiki.org/wiki/API:Links

For example I may start with this sparql query:

SELECT ?item ?type ?itemLabel ?typeLabel WHERE {
 {
   SELECT ?item WHERE {
    SERVICE wikibase:mwapi {
      bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam mwapi:generator "search" .
      bd:serviceParam mwapi:gsrsearch "artificial intelligence" .
      bd:serviceParam mwapi:gsrlimit "max" .
      ?item wikibase:apiOutputItem mwapi:item .
    }
  } LIMIT 100
 }
 hint:Prior hint:runFirst "true".
 ?item wdt:P31|wdt:P279 ?type .
 SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 100

Could you show examples to expand or adapt this query ?

Could you suggest other references than https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI , for an extensive use to callout to Mediawiki API from SPARQL (so that I can exploit wikidata and wikipedia) ?

user305883
  • 1,635
  • 2
  • 24
  • 48
  • Not sure if I understood correctly, so let me ask you something: your current query returns items based in Wikipedia lookup. What else should your query return now? – UninformedUser Nov 17 '19 at 07:52
  • For each item, i want to get the links to the corrresponding wikidata (wp) entity of any linked wikipedia (wp) page, even if wp pages may be in different languages. E.g. think wp page "cat" is linked with "feline" , "house", " domestic" wp pages; the page has a WD I'd , say, q25. The same entity is linked to "gato Portuguese", a page that is only available in Portuguese. For Q25, I want to retrieve all linked wp pages, with their WD ids, Wd types, and the Wp properties, also for wp pages who are linked in diverse languages. – user305883 Nov 17 '19 at 13:56
  • Hm, items are connected via `?sitelink schema:about ?item; schema:isPartOf ` pattern to their Egnlish Wikipedia sites for example. – UninformedUser Nov 17 '19 at 14:51
  • To extend it to multiple languages, you could also do for your `wd:Q25` the following: `SELECT DISTINCT ?article ?lang ?name WHERE { ?article schema:about wd:Q25 ; schema:inLanguage ?lang ; schema:name ?name ; schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] . FILTER(?lang in ('en', 'uz', 'ru', 'ko')) . FILTER (!CONTAINS(?name, ':')) . }` – UninformedUser Nov 17 '19 at 14:53
  • Thank you for sketching out use of filters, it adds to my question but not answer to it. I am looking at links of a page, each linked page will carry the corresponding wikidata id. Links may also exist for other languages. Maybe an example starting from Wikipedia is clearer. Look at: `https://en.wikipedia.org/w/api.php?action=query&format=json&prop=links&meta=&titles=Albert+Einstein&pllimit=500` for the each `pages`and own `links` , I want to have the matching wd Id. Here, suppose Albert Einstein is `wd:Q25` and it has other links in Dutch, not mentioned in English: I'd like to have them too. – user305883 Nov 18 '19 at 13:46
  • A backward approach may be to query all the wikipedias in which "Albert Einstein" exists, and then query the wikidata. However, since Wikidata already decoupled entities, I wonder if wikidata is more efficient and offer final result with less queries. Also, I want to get the Wikidata descriptor of the entities (e.g. "food", "business", "person", "organisation"), both for the queried page as well as the returned links. I am aware multiple queries might be done, if you can show a roadmap and possibly where too look for a way to map wikidata Apis to Wikipedia parameters, much appreciated. – user305883 Nov 18 '19 at 13:50

0 Answers0