0

I am using Wikidata Query Service (which uses SPARQL) to retrieve some cities, but get duplicate items with different property values (even though these items pertain to the same city). It appears the error comes from some of the cities properties such as population or coordinates not having a "High (or preferred) rank". In the example query below (link to WQS), the city "Candon" appears 15 times in the query results, probably because the city has its property population (P1082), including the old population, all set with a "Normal rank". The 2020 census population should be set as "High rank".

How can I force the query service to retrieve distinct cities (items) and get only their latest population, without having to edit the Wikidata item itself to set its property rank?

SELECT DISTINCT ?item ?itemLabel ?instanceOfLabel ?population ?coords WHERE {
  ?item wdt:P17 wd:Q928;
        wdt:P31/wdt:P279* ?instanceOf.
   
  OPTIONAL { # error appears to come from here due to Wikidata property rank issues
    ?item wdt:P1082 ?population;
          wdt:P625 ?coords;
  }.
  
  VALUES ?instanceOf {
    wd:Q104157
  }

  MINUS {
    ?item wdt:P576 ?dissolvedDate;
          wdt:P7888 ?mergedInto.
  }

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} ORDER BY ?itemLabel
logi-kal
  • 7,107
  • 6
  • 31
  • 43
JAT86
  • 997
  • 12
  • 24
  • 1
    for latest population just do the same as already explained here: https://stackoverflow.com/questions/49066390/how-to-get-only-the-most-recent-value-from-a-wikidata-property – UninformedUser Feb 08 '22 at 20:19
  • 1
    for just one coordinate, you have to apply an aggregate function like `sample` which also needs a `group by` all other projected variables – UninformedUser Feb 08 '22 at 20:20
  • 1
    `OPTIONAL { ?item p:P1082 [ps:P1082 ?population ; pq:P585 ?pop_date] } FILTER NOT EXISTS { ?item p:P1082/pq:P585 ?pop_date_ . FILTER (?pop_date_ > ?pop_date) }` – UninformedUser Feb 08 '22 at 20:23
  • @UninformedUser, thank you very much for the information. Can you please elaborate what your third comment code means. It works, but looks quite cryptic to me. – JAT86 Feb 11 '22 at 09:27
  • 1
    oh, sure - it just filters out results when there is some population date value which is newer then the value of the current value bound on the result row. and this `p:P1082/pq:P585` is the syntax to get statements over statements in Wikidata as the date is attached to a statement itself - they call it called it [qualifiers](https://www.wikidata.org/wiki/Help:Qualifiers) - here are some more example queries that might help: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks – UninformedUser Feb 11 '22 at 15:09

0 Answers0