2

I need a logic OR of all valid statements about an item or its parents (instance-parent or subclass-parent).

Example: Q957653 is a instance of Q3184121, and the last have ?item P17 Q155. So safisfies a chain Q957653 P31 Q3184121 P17 Q155... So I need something as

    ?item P17 Q155
    | ?item P31 ?x P17 Q155
    | ?item P31 ?x P31 ?y P17 Q155
    | ?item P279 ?x P17 Q155
    | ?item P279 ?x P31 ?y P17 Q155
    | ?item P279 ?x P279 ?y P17 Q155

A big logic "or" for all possible chains of P31 or P279 dependences.


Real example

I need a list of items that have some property (eg. ?item wdt:P402 _:b0.) and are instances or subclasses of items with other property, eg. wdt:P17 wd:Q155.

The "first level" of ?item wdt:P17 wd:Q155 is working fine,

SELECT DISTINCT ?item ?osm_relid ?itemLabel 
WHERE {
  ?item wdt:P402 _:b0.
  ?item wdt:P17 wd:Q155.
  OPTIONAL { ?item wdt:P1448 ?name. }
  OPTIONAL { ?item wdt:P402 ?osm_relid .}
  SERVICE wikibase:label { 
      bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". 
  }
}

But how to express the union (or logic "or") of all other possible dependences?

Edit/Notes

Supposing that ?item wdt:P31*/wdt:P279* wd:Qxx . will be all "chain dependences of something Qxx", as I need... But Qxx is also a query,
?item wdt:P31*/wdt:P279* (?xx wdt:P17 wd:Q155) ..

... A solution (!) seems

SELECT  (COUNT(DISTINCT ?item) AS ?count) 
WHERE {
  ?item wdt:P402 _:b0.
  ?item  (wdt:P31*|wdt:P279*)/wdt:P17 wd:Q155 .
}

but I can't check because is time-consuming.

... Something perhaps near the "feasible solution" is
?item wdt:P31*/wdt:P279*/wdt:P31*/wdt:P17 wd:Q155 .
... after testing feasibles, seems wdt:P31*/wdt:P279*/wdt:P17 the only "optimal" with no time-out problem.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • 1
    SPARQL 1.1 property paths are the only way to implement this somehow, and you already figured out that those to not perform well in many situations. welcome to the limitations of SPARQL ... in the end, it's not a graph traversal language, thus, there are usually no index structures that would allow better performance for those kind of queries (imagine a graph with a very deep structure), thus, it has to perform joins as long as there are no edges along the path – UninformedUser Aug 03 '18 at 02:56
  • `SELECT distinct ?item ?osm_relid WHERE { ?item (wdt:P31|wdt:P279)*/wdt:P17 wd:Q155 . ?item wdt:P402 ?osm_relid . hint:Prior hint:runLast true . }` ~5600 results in ~10 seconds – Stanislav Kralin Aug 03 '18 at 06:19
  • @AKSW, thanks, I agree... But look Stanlislav's clues! there are a Bigdata initiative at https://wiki.blazegraph.com/wiki/index.php/QueryHints – Peter Krauss Aug 03 '18 at 08:07
  • @StanislavKralin, perfect, that **is the answer**! PS: I changed P402 to P625, and it is running fast with ~32000 items. – Peter Krauss Aug 03 '18 at 08:09
  • 1
    @PeterKrauss yes, I saw this. something that is triple store specific and doesn't hold in general. In a perfect world, the query optimizer would do this automatically for us, but this will never happen indeed - there will be always corner cases – UninformedUser Aug 04 '18 at 06:09

1 Answers1

2

In order to improve performance, you can use Blazegraph query hints. Query hints allow to modify auto-generated query execution plan.

SELECT DISTINCT ?item ?itemLabel ?osm_relid ?name {
  ?itemi wdt:P17 wd:Q155 .
  hint:Prior hint:runFirst true .
  ?item (wdt:P31|wdt:P279)* ?itemi .
  ?item wdt:P625 [].
  OPTIONAL { ?item wdt:P1448 ?name. }
  OPTIONAL { ?item wdt:P402 ?osm_relid .}
  SERVICE wikibase:label { 
      bd:serviceParam wikibase:language "en,[AUTO_LANGUAGE]". 
  }
}

Try it!

This is how query execution plan looks like (just add &explain to the query URL and scroll down).

Please note that you can't use hint:Prior hint:runLast true from the original comment when the label service is used: there can be only one such hint in any graph pattern group.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58