1

For example i need to get all peoples names in wikipedia and it pages text (parsed or not- it's not important).

I write SPARQL query...

SELECT ?human ?humanLabel WHERE {
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
  ?human wdt:P31 wd:Q5.
}
LIMIT 10

How in this query get a full text of articles with addition column?

VadimCh
  • 71
  • 1
  • 9
  • I doubt that the fulltext is in the triple store, but you can at least get the Wikipedia URL link with `?sitelink schema:about ?human FILTER REGEX(STR(?sitelink), ".wikipedia.org/wiki/")` – UninformedUser Jul 08 '19 at 04:28
  • Ok, thank you i suspected it. But how can i get any page(article) ID, that i can download the wikidump, and link the article with SPARQL query? Any ideas? – VadimCh Jul 08 '19 at 09:27
  • Well, as I said, you can get the links with `SELECT ?human ?sitelink WHERE { ?human wdt:P31 wd:Q5. ?sitelink schema:about ?human filter(strstarts(str(?sitelink), "https://en.wikipedia.org/wiki/")) } LIMIT 10` as an example for the English Wikipedia article links – UninformedUser Jul 08 '19 at 10:25
  • But this code not return Page or article ID... – VadimCh Jul 08 '19 at 18:21
  • `[ schema:about ?human ; schema:name ?name ; schema:isPartOf ] SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org" . bd:serviceParam wikibase:api "Generator" . bd:serviceParam mwapi:generator "allpages" . bd:serviceParam mwapi:gapfrom ?name . bd:serviceParam mwapi:gapto ?name . ?pageid wikibase:apiOutput "@pageid" . }` – UninformedUser Jul 08 '19 at 18:44
  • `SELECT ?item ?pageid WHERE { {select ?item ?name {?item wdt:P31 wd:Q5. ?s schema:about ?item ; schema:name ?name ; schema:isPartOf } limit 10} SERVICE wikibase:mwapi { bd:serviceParam wikibase:endpoint "en.wikipedia.org" . bd:serviceParam wikibase:api "Generator" . bd:serviceParam mwapi:generator "allpages" . bd:serviceParam mwapi:gapfrom ?name . bd:serviceParam mwapi:gapto ?name . ?pageid wikibase:apiOutput "@pageid" . } } ` – UninformedUser Jul 08 '19 at 18:54
  • Related: https://stackoverflow.com/questions/39773812/how-to-query-for-people-using-wikidata-and-sparql – baptx Jul 27 '21 at 15:12

1 Answers1

0

You can't. SPARQL is designed to get the data only from wikidata. So the best solution for you is to run your query first, then loop over it, and run the following API for each record to get the page text.

https://en.wikipedia.org//w/api.php?action=query&format=json&prop=revisions&titles=Barack_Obama&utf8=1&rvprop=content

Change barack obama to the page title.

ASammour
  • 865
  • 9
  • 12