3

This is my first time playing with SPARQL. I have created a query below but only getting the first 10000 results. How can I get all results from DBpedia?

    from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
    PREFIX dbpedia0: <http://dbpedia.org/ontology/>
    PREFIX dbpedia2: <http://dbpedia.org/property/>
    SELECT str(?song) as ?song str(?artist) as ?artist str(?genre) as ?genre WHERE {
    ?song a dbpedia0:Single.
    ?song dbpedia0:genre ?genre.
    ?song dbpedia0:musicalArtist ?artist
    }

    ORDER BY ?genre 
    """)
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
    print result["genre"]["value"].replace("http://dbpedia.org/resource/", "") +"\t\t"+result["artist"]["value"].replace("http://dbpedia.org/resource/", "")+"\t\t"+result["song"]["value"].replace("http://dbpedia.org/resource/", "")

I found something re: OFFSET and LIMIT, but I am not sure how to use it to get ALL results.

TallTed
  • 9,069
  • 2
  • 22
  • 37
joe borg
  • 133
  • 1
  • 1
  • 7
  • 2
    You can't **remove** the default limit set by the public DBpedia service. You can workaround it by doing some kind of pagination, thus, doing queries `OFFSET 10000`, `OFFSET 20000`, and so on and so furth until the resultset is empty. For correctness, this workaround would also need `ORDER BY` but it's pretty expensive. – UninformedUser May 25 '18 at 10:52
  • 1
    By the way, instead of doing this string replacement hack `.replace("http://dbpedia.org/resource/", "")` - the "better" way would be to get the English `rdfs:label` of the resources as those labels are supposed to provide a human readable form. – UninformedUser May 25 '18 at 10:54
  • First of all thankyou for both comments as they are extremely helpful. Re: OFFSET therefore a loop is required until all resultset is empty? – joe borg May 25 '18 at 10:57
  • 1
    Short answer: yes, a loop inside your client code that increases the `OFFSET` value by the default limit of the SPARQL endpoint - in your case `10 000` – UninformedUser May 25 '18 at 11:18
  • Thanks fantastic! I managed with this method! :) – joe borg May 25 '18 at 11:28
  • How do you get the rdfs:labels? Been trying but no luck! :S – joe borg May 25 '18 at 13:04
  • Well, simply `?song rdfs:label ?songLabel .` or what do you mean? And don't forget to select the corresponding variable and add a language filter for e.g. English. – UninformedUser May 25 '18 at 13:08
  • for the song it is easy but for the genre and artist I am finding it hard! :( Is there a tutorial anywhere showing this label thing? – joe borg May 25 '18 at 13:11
  • What kind of tutorial should this be? It's always the same, add a triple pattern that matches additional data. What's wrong with adding `?genre rdfs:label ?genreLabel .`? – UninformedUser May 25 '18 at 13:36
  • This and other public endpoint limits and restrictions are [discussed on the DBpedia website](https://wiki.dbpedia.org/public-sparql-endpoint). – TallTed Jun 08 '18 at 21:46

0 Answers0