3

I have 7200 or so SKOS.Concept objects created by rdflib-sqlalchemy from parsing a turtle file stored in a Postgres DB.

The following SPARQL query takes over 30 seconds to respond with data:

 SELECT ?subject ?prefLabel
 WHERE { ?subject rdf:type 
 <http://www.w3.org/2004/02/skos/core#Concept> .
 ?subject skos:prefLabel ?prefLabel .
 FILTER (lang(?prefLabel) = 'en') } 
 order by ?prefLabel
 LIMIT 20 OFFSET 0

I am using the limit and offset to paginate through results. I pass in the language parameter (one of ar, en, es, fr, ru, zh).

If I simply select the subject, the resulting query is lightning fast -- but I need to collate by prefLabel in the result set.

This is a query that ran very fast in a key value store (Sleepycat) but crawls when moving to rdlib-sqlalchemy with a Postgres backend.

I am quite new to rdlif and SPARQL -- any suggestions or insights would be welcome.

Thanks in advance!

fiacre
  • 1,150
  • 2
  • 9
  • 26
  • Well, I suppose that removing the `?subject a skos:Concept` patern should make the query faster... – Stanislav Kralin Mar 31 '18 at 21:06
  • Yes, it would. I want the URI and preferred label of the concept so that I set up pagination through all concepts in the graph. I suppose I could get the URI via ajax when the user clicks though ... but I have more than just SKOS.Concepts in the graph and I want to give the user the option on which graph to browse. – fiacre Mar 31 '18 at 23:26
  • If removing the `rdf:type`pattern helps, why do not use special predicate for labels of `skos:Concept`s? FYI, literal statements and `rdf:type` statements are stored in separate tables when using rdflib-sqlalchemy. – Stanislav Kralin Apr 01 '18 at 06:03
  • 30s is quite slow, but as far as I can see, the API uses just 4 tables with some indexes, but not the common permutations, e.g. `pos` which would be used in your example. It's also not clear how the SPARQL to SQL rewriting works. Is the join done on client side, is there a push down of filter predicates, etc. At least, for your query there should be a single join on two tables - Stanislav already mentioned this. `ORDER BY` + `OFFSET` can also be expensive. Clearly, the developers should give you better answers, you should contact them. – UninformedUser Apr 01 '18 at 07:06
  • BTW, this works: https://stackoverflow.com/a/2950685/7879193 – Stanislav Kralin Apr 01 '18 at 08:56
  • I am seeing the problem -- the SQL loops over all terms in the collection to get the preferredLabel. Ugh! – fiacre Apr 02 '18 at 22:38

0 Answers0