2

What I am trying to do is to get properties with quantity property types of a certain class (such as City, Country, Human, River, Region, Mountain, etc). I tried several classes like Country (wd:Q6256) works okay with the query below, but many other classes makes the query to exeed time limit. How can I achieve the result optimizing the query below? or is there any other way to get the properties of Quantity type in a certain class?

SELECT DISTINCT ?p_ ?pLabel ?pAltLabel
WHERE {
  VALUES (?class) {(wd:Q515)}
  ?x ?p_ [].
  ?x p:P31/ps:P31 ?class.

  ?p wikibase:claim ?p_.
  ?p wikibase:directClaim ?pwdt.
  ?p wikibase:propertyType ?pType.
  FILTER (?pType = wikibase:Quantity)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
Keanu Paik
  • 304
  • 2
  • 12

1 Answers1

2

Attempt 1: Optimizing the query

Some observations:

  • Instead of p:P31/ps:P31, you could use wdt:P31 which is faster by avoiding the two-property hop, but finds only the truthy statements
  • The expensive part is the call to the label service at the end, as can be seen by commenting that line out by placing # at the start of the line
  • The query retrieves every claim on every city (many!), gets the properties of the claims (few!), and only removes the duplicates in the end (with DISTINCT)
  • As a result, the label service is called many times for the same property, once per claim! This is the big problem with the query
  • This can be avoided by moving the retrieval of properties with the DISTINCT into a subquery, and calling the label service only at the end on the few properties
  • After that change it should be fast, but is still slow because the query optimiser seems to evaluate the query in the wrong order. Following hints from this page, we can turn the query optimiser off.

This works for me:

SELECT ?p ?pLabel ?pAltLabel {
  hint:Query hint:optimizer "None" .
  {
    SELECT DISTINCT ?p_ {
      VALUES ?class { wd:Q515 }
      ?x wdt:P31 ?class.
      ?x ?p_ [].
    }
  }
  ?p wikibase:claim ?p_.
  ?p wikibase:propertyType ?pType.
  FILTER (?pType = wikibase:Quantity)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Attempt 2: Splitting the task into multiple queries

Having learned that the approach above doesn't work for some of the biggest categories (like wd:Q5, “human”), I tried a different approach. This will get all results, but not in a single query. It requires sending ~25 individual queries and combining the results afterwards:

  • We start by listing the quantity properties. There are, as of today, 503 of them.
  • We want to keep only those properties that are actually used on an item of type “human”.
  • Because that check is so slow (it needs to look at millions of items), we start by only checking the first 20 properties from our list.
  • In the second query, we're going to check the next 20, and so on.

This is the query that tests the first 20 properties:

SELECT DISTINCT ?p ?pLabel ?pAltLabel {
  hint:Query hint:optimizer "None" .
  {
    SELECT ?p ?pLabel ?pAltLabel {
      ?p wikibase:propertyType wikibase:Quantity.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
    }
    OFFSET 0 LIMIT 20
  }
  ?p wikibase:claim ?p_.
  ?x ?p_ [].
  ?x wdt:P31 wd:Q5.
}

Increase the OFFSET to 20, 40, 60 and so on, up to 500, to test all properties.

cygri
  • 9,412
  • 1
  • 25
  • 47
  • 1
    Nice answer. Just a comment, the path `p:31/ps:31` was suggest in one of his [previous questions](https://stackoverflow.com/questions/55314201/my-sparql-only-get-answers-partially-missing-some-answers-unanswered) and has to be used to get all statements not only the [truthy](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Truthy_statements) type statements. See this answer: https://stackoverflow.com/a/47100906/4744359 – UninformedUser Apr 05 '19 at 05:36
  • @AKSW thanks, that is good to know, I've edited my answer accordingly. – cygri Apr 05 '19 at 09:51
  • @cygri Thank you very much. It is very good to know. I didn't recognize the way before. It works for me too. However, it works for ```wd:Q515``` but still time out with ```wd:Q5```(human) so I need to optimse the query further. Anyway it was much help. – Keanu Paik Apr 05 '19 at 15:46
  • @KeanuPaik I suppose there's lots of humans in Wikidata! I get a result for Q5 by adding `LIMIT 4000` after the inner `SELECT`; with `LIMIT 5000` it times out. That will return only partial results so it will miss some properties. I'm afraid that's the best I can do! – cygri Apr 05 '19 at 20:05
  • @KeanuPaik Actually I gave it another go, and have updated the answer with a different approach that splits the task into multiple queries. Maybe that's acceptable? – cygri Apr 05 '19 at 20:50
  • 1
    @cygri Oh, it is great. Actually I also tried with ```LIMIT``` in the query as well. But second approach can get all the properties. Totally different approach and good alternative. Thank you for your great help. – Keanu Paik Apr 06 '19 at 13:43