3

I am using live-dbpedia to retrieve the list of persons. I am executing a sparql query on live-dbpedia endpoints to get the result.I have fixed the offset and limit value in the query and getting the records after each 10000 attempt. But when I was trying to execute at 580000 offset value, 504 Gateway Time-out error happens.

Not Working SPARQL Query:

SELECT DISTINCT ?dbpedia_link str(?name) as ?label str(?label1) as ?label1 ?freebase_link WHERE {
        ?dbpedia_link rdfs:label ?label1 . 
        ?dbpedia_link foaf:name ?name .
        {
         { ?dbpedia_link rdf:type dbpedia-owl:Person }                            
        }                        
        OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link .
        FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
        FILTER (lang(?label1) = 'en'). 
        ?dbpedia_link dcterms:subject ?sub 
        }Limit 1000
        OFFSET 580000

Working SPARQL Query :

SELECT DISTINCT ?dbpedia_link str(?name) as ?label str(?label1) as ?label1 ?freebase_link WHERE {
            ?dbpedia_link rdfs:label ?label1 . 
            ?dbpedia_link foaf:name ?name .
            {
             { ?dbpedia_link rdf:type dbpedia-owl:Person }                            
            }                        
            OPTIONAL {?dbpedia_link owl:sameAs ?freebase_link .
            FILTER regex(?freebase_link, "^http://rdf.freebase.com") .}
            FILTER (lang(?label1) = 'en'). 
            ?dbpedia_link dcterms:subject ?sub 
            }Limit 1000
            OFFSET 50000

How to overcome this problem.

iNikkz
  • 3,729
  • 5
  • 29
  • 59
  • In addition to [jimkont's answer](http://stackoverflow.com/a/26136741/1281433), note that **limit n** and **offset m** need an **order by** in order to be useful. If there's no specified ordering, then the endpoint can return the same **n** results over and over again. E.g., see [my answer](http://stackoverflow.com/a/25147648/1281433) to [How to resolve the execution limits in Linkedmdb](http://stackoverflow.com/q/25141247/1281433). – Joshua Taylor Oct 01 '14 at 13:48

1 Answers1

4

Put a delay between your requests. There is a rate limit in the live endpoint and this is the error you get when you exceed it. There is also a short timeout to make the service more available.

(Disclaimer: I am responsible for the service)

jimkont
  • 913
  • 1
  • 11
  • 18
  • Thanks for responding.. How can I put `delay in sparql`? If I am executing a `query just a once`. Plz try to execute the `Non-working SPARQL query` from above, then you will get the error of gateway. – iNikkz Oct 01 '14 at 09:35
  • 2
    you can try the Live clone hosted by OpenLink http://dbpedia-live.openlinksw.com/sparql . That server has more resources and answers the query. – jimkont Oct 01 '14 at 09:53
  • `@jimkont :` I need the records from the http://live.dbpedia.org/sparql endpoints only. We did some `R&D` on http://dbpedia-live.openlinksw.com/sparql endpoints. It is not giving proper output. It shows exception in such cases Ex.`` – iNikkz Oct 01 '14 at 12:02
  • By today evening we deploy a new endpoint and the openlink cache will also be correct from now on – jimkont Oct 01 '14 at 13:16
  • Have you **(@jimkont)** deployed the **new endpoint** or upgraded the previous. Because still, I am facing the `same gateway error`. As I have discussed with my colleague, he told me that don't use `filter(regular expression)` because it is taking too much time to execute the `sparql query`. – iNikkz Oct 03 '14 at 06:00
  • 1
    See http://www.mail-archive.com/dbpedia-discussion@lists.sourceforge.net/msg06436.html Now you can use the OpenLink endpoint that has the same data as the official one. (the rate limits didn't change) – jimkont Oct 03 '14 at 06:17
  • Sir **@jimkont**, http://dbpedia-live.openlinksw.com/sparql is working but when I am trying for offset 500000, it doesn't give any result. why so? – iNikkz Oct 03 '14 at 07:46