2

I have recently been exploring SPARQl queries and they seem noticably slow. I also tried to run a federated query using Apache Jena Fuseki, this too was very slow, slower than the queries using one SPARQL endpoint. Is there a way to improve the performance of SPARQL query? Some suggestions I found on the internet was to use cached query results, which I think defeats the purpose of putting data in the web.

Here is an example of the federated query that I tried. I got this query from Connecting Linkedmdb and DBpedia via federated SPARQL queries

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT ?film ?label ?subject WHERE {
SERVICE <http://data.linkedmdb.org/sparql> {
    ?film a movie:film .
    ?film rdfs:label ?label .
    ?film owl:sameAs ?dbpediaLink 
    FILTER(regex(str(?dbpediaLink), "dbpedia", "i"))
}
SERVICE <http://dbpedia.org/sparql> {
    ?dbpediaLink dcterms:subject ?subject
}
}
LIMIT 50
Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
RDangol
  • 179
  • 9
  • Please show your query. – Stanislav Kralin Jul 02 '17 at 09:09
  • 1
    No way, you're querying an LinkedMDB hosted on a D2R server and use a REGEX which needs a full scan over all intermediate results. You can check the performance without the REGEX, I'm sure this will be much faster. But, then you're also doing a federated query locally ... – UninformedUser Jul 02 '17 at 10:55
  • 1
    There are also "local" `owl:sameAs` links to LinkedMDB on DBpedia. – Stanislav Kralin Jul 02 '17 at 11:05
  • 1
    I've to change my opinion :D I tried the [LinkedMDB query separately](http://data.linkedmdb.org/sparql?query=SELECT%20%3Ffilm%20%3Flabel%20WHERE%20%7B%0A%20%20%20%20%3Ffilm%20a%20movie%3Afilm%20.%0A%20%20%20%20%3Ffilm%20rdfs%3Alabel%20%3Flabel%20.%0A%20%20%20%20%3Ffilm%20owl%3AsameAs%20%3FdbpediaLink%20%0A%20%20%20%20FILTER(regex(str(%3FdbpediaLink)%2C%20%22dbpedia%22%2C%20%22i%22))%0A%7D) - it returns the result instantly. That means the federated query is probably the bottleneck. – UninformedUser Jul 02 '17 at 11:38
  • @AKSW Strangely, without the REGEX the query does not seem to work. What is the best way of running a federated query? If not locally, where/how should I run them? Also, the regular SPARQL queries are not really fast either. – RDangol Jul 02 '17 at 11:58
  • Which query does not work? The whole query or just the single LinkedMDB query executed on the LinkedMDB endpoint? – UninformedUser Jul 02 '17 at 12:11
  • @AKSW The whole query. I tried running the query without the filter. Tried it locally and also in SPARQL playground [sparql-playground.sib.swiss](http://sparql-playground.sib.swiss) – RDangol Jul 02 '17 at 12:29
  • That's what I said in my second comment - it's probably too slow because of the federated engine. – UninformedUser Jul 02 '17 at 13:51
  • @AKSW Could you please suggest me how should I handle a federated query? I am fairly new to this stuff so any help would be much appreciated. – RDangol Jul 02 '17 at 13:59
  • 1
    Taking into account `LIMIT 50` (without `ORDER BY` etc.) you do not need to fetch all the data from LinkedMDB. Solution modifiers are not allowed after `SERVICE` clauses, thus, just wrap your LinkedMDB service invocation into `{ SELECT ?film ?label ?dbpediaLink WHERE {` ... `} LIMIT 100 }`. – Stanislav Kralin Jul 02 '17 at 21:13
  • @StanislavKralin Thanks for the tip. Still slow but a lot faster than the previous query. – RDangol Jul 03 '17 at 10:14
  • @RDangol, some benchmarhing. Your query execution time is about 25 seconds on my local Jena Fuseki, same time on my local GraphDB, and about 5 times faster on GraphDB instance in AWS. I think network latency is the main problem. – Stanislav Kralin Oct 21 '17 at 19:19

0 Answers0