3

I was trying to extract all movies from Linkedmdb. I used OFFSET to make sure I wont hit the maximum number of results per query. I used the following scrip in python

"""
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
 SELECT distinct ?film
 WHERE {
 ?film a movie:film .
 } LIMIT 1000 OFFSET %s """ %i

I looped 5 times, with offsets being 0,1000,2000,3000,4000 and recorded the number of results. It was (1000,1000,500,0,0). I already knew the limit was 2500 but I thought by using OFFSET, we can get away with this. Is it no true? There is no way to get all the data (even when we use a loop of some sort)?

user1848018
  • 1,086
  • 1
  • 14
  • 33

1 Answers1

3

Your current query is legal, but but there's no specified ordering, so the offset doesn't bring you to a predictable place in the results. (A lazy implementation could just return the same results over and over again.) When you use limit and offset, you need to also use order by. The SPARQL 1.1 specification says (emphasis added):

15.4 OFFSET

OFFSET causes the solutions generated to start after the specified number of solutions. An OFFSET of zero has no effect.

Using LIMIT and OFFSET to select different subsets of the query solutions will not be useful unless the order is made predictable by using ORDER BY.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Looks like with an offset great than 2500, Linkedmdb returns nothing. I did follow the order by, offset, limit instruction but it appears that the server returns nothing for the affset values of greater than 2500 – user1848018 Aug 06 '14 at 20:34
  • It may be that it will only select 2500 results internally, then order through them, and then page through those. You might be able to get around that if you can find some value that you can filter on to select fewer results in the beginning. Unfortunately, to get the data you need, I think you'd need some of the SPARQL 1.1 operators, which I don't think the LinkedMDB endpoint supports. – Joshua Taylor Aug 06 '14 at 20:53
  • @JoshuaTaylor so When the queryreturns 0 results does it mean that We can stop? Can we put a condition like "if number of results = 0 then Stop executing the sparql query which is inside a loop" – Hani Goc Jun 12 '15 at 07:50