6

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):

SELECT
?country 
?countryLabel 
?hdi_value
?hdi_date
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country p:P1081 ?hdi_statement. 
         ?hdi_statement ps:P1081 ?hdi_value.
         ?hdi_statement pq:P585 ?hdi_date.
       }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Link to Query Console

I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.

Any help or link to a relevant example query is appreciated!

P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country

SELECT
?country 
?countryLabel 
?population
WHERE {
  ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1082 ?population. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

while the same query but for HDI returns multiple results per country:

SELECT
?country 
?countryLabel 
?hdi
WHERE {
 ?country wdt:P31 wd:Q6256.
  OPTIONAL { ?country wdt:P1081 ?hdi. }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
Brian
  • 165
  • 1
  • 3
  • 6
  • 1
    1. Paste something like `FILTER NOT EXISTS {?country p:P1081/pq:P585 ?hdi_date1 . FILTER (?hdi_date1 > ?hdi_date) }` *outside* of `OPTIONAL`. – Stanislav Kralin Mar 02 '18 at 10:59
  • 2. Duplicate of https://stackoverflow.com/a/47100906/7879193 – Stanislav Kralin Mar 02 '18 at 11:04
  • Try this for your first query : SELECT DISTINCT ?country ?countryLabel (MAX(?hdi_value) AS ?hd) (MAX(?hdi_date) AS ?hde) WHERE { ?country wdt:P31 wd:Q6256. OPTIONAL { ?country p:P1081 ?hdi_statement. ?hdi_statement ps:P1081 ?hdi_value. ?hdi_statement pq:P585 ?hdi_date. } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } GROUP BY ?country ?countryLabel – Coder Mar 02 '18 at 11:06
  • @Shubhangi, are you sure that HDI are just increasing for all the countries? I. e. that `MAX(hdi_value)` corresponds to `MAX(hdi_date)`? – Stanislav Kralin Mar 02 '18 at 11:12
  • @StanislavKralin I have checked for few countries and it was fine. – Coder Mar 02 '18 at 11:16
  • @Shubhangi, check for Venezuela – Stanislav Kralin Mar 02 '18 at 11:23
  • @StanislavKralin yes I have checked for Venezuela and it gives me the right answer as he wants the most recent result. – Coder Mar 02 '18 at 11:26
  • @Shubhangi, the right answer for Venezuela is 0.762, not 0.764. – Stanislav Kralin Mar 02 '18 at 11:29
  • Thank you both! Stanislav's answer works perfectly here. Unfortunately, Shubhangi's answer does not work when the most recent HDI is lower than a previous HDI as was the case with Venezuela. – Brian Mar 02 '18 at 11:55
  • @StanislavKralin oops yeah you are right I didn't see the very last digit of HDI for Venezuela. – Coder Mar 02 '18 at 11:56
  • @Shubhangi, full list of such countries (their maximal HDI is in the past): *Greece, Cyprus, Cuba, Belize, East Timor, Federated States of Micronesia, Venezuela, Saint Lucia, Jamaica, Honduras, Iraq, Oman, Syria, Central African Republic, South Sudan, Equatorial Guinea, The Gambia, Libya, Swaziland, State of Palestine* – Stanislav Kralin Mar 02 '18 at 12:05
  • @StanislavKralin Thank you for pointing out the mistakes :) – Coder Mar 02 '18 at 12:26

1 Answers1

4

Both your questions are duplicates, but I'll try to add interesting facts to existing answers.

Question 1 is a duplicate of SPARQL query to get only results with the most recent date.

This technique does the trick:

FILTER NOT EXISTS {
    ?country p:P1081/pq:P585 ?hdi_date_ .
    FILTER (?hdi_date_ > ?hdi_date)
}

However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).


Question 2 is a duplicate of Some cities aren't instances of city or big city?

You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.

Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.

The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.

Nemo
  • 2,441
  • 2
  • 29
  • 63
Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
  • I tried to do exactly what you described but on the cities - but doesn't look correct. SELECT ?city ?cityLabel ?cityPopulation ?hdi_date WHERE { # ?country wdt:P31 wd:Q6256. # OPTIONAL { ?country wdt:P1082 ?population. } ?city wdt:P31/wdt:P279* wd:Q515 . ?city p:P1082 ?populationStatement . ?populationStatement ps:P1082 ?cityPopulation. FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?hdi_date_ . FILTER (?hdi_date_ > ?hdi_date) } SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } } order by ?countryLabel – Areza Dec 21 '19 at 20:32
  • @user702846, "exactly" would be rather something like this: https://w.wiki/EKB :) – Stanislav Kralin Dec 22 '19 at 20:54