First, I'm going to use the prefixes defined in the DBpedia SPARQL endpoint so that we can copy and paste queries. I think the only difference is that dbo
will now be dbpedia-owl
. Second, you're using a number of raw data properties, but if you can, you ought to try to use properties from the ontology, as explained in this answer. That doesn't necessarily affect the results you're getting here, but you'll generally get cleaner data if you use the ontology properties.
Modifying your query
FILTER NOT EXISTS for removing countries that have ended
Let's clean up the query a little bit first, and then tend to the question of the getting the various population properties. Removing countries that have an end date can be done a bit more simply. Instead of
OPTIONAL {?country dbpprop:yearEnd ?yearEnd}
FILTER (!bound(?yearEnd))
you can use FILTER NOT EXISTS
to make this a bit more direct:
FILTER NOT EXISTS { ?country dbpprop:yearEnd ?yearEnd }
In an attempt to use properties from the DBpedia ontology in preference to Raw Infobox data properties, you might want to consider using dbpedia-owl:dissolutionYear
rather than dbpprop:yearEnd
, giving:
FILTER NOT EXISTS { ?country dbpedia-owl:dissoluationYear ?yearEnd }
Simplify filtering for languages
It's reasonable to expect rdfs:label
values to be literals, and the lang
function requires its argument to be a literal, so you don't really need to bind str(?enName)
to ?name
; it's sufficient just to bind ?name
in the triple pattern, and then check its language (which you're doing correctly using langMatches
). That is, instead of
?country rdfs:label ?enName .
FILTER (langMatches(lang(?enName), "en"))
BIND (str(?enName) AS ?name)
you can just use
?country rdfs:label ?name .
FILTER (langMatches(lang(?name), "en"))
This does mean that the name you get back will have a language tag. If you really just want the plain string, you can either BIND as you did before, or make an as
expression in the select, e.g.,
SELECT DISTINCT (str(?name) as ?noLangName) ?population
Checking that population is bound and is a number
I don't think filtering on xsd:integer(?population)
will do much for you either. That notation isn't a type predicate, but a casting function, so ?population
is being cast as an integer, and I think the filter will always let the value through, except in the case of 0
, which would fail. You'd still want to know if a country has a population of 0
though, right? However, you do only want countries with populations, so you could filter with bound
:
FILTER(bound(?population))
However, since the properties here are raw infobox properties, there is some noise in the data, so we end up with values like
"Denmark"@en "- Density 57,695"@en
"Denmark"@en "- Faroe Islands"@en
which aren't useful. A better filter would just check that the value is a number (which will implicitly require that it's bound), and there is a function isNumeric
for just that purpose, so we use:
FILTER (isNumeric(?population))
Simplifying similar UNION patterns with VALUES
You can clean up the UNION
pattern by using VALUES
. Instead of UNION
ing several almost identical patterns, you can define a variable ?hasCode
that will only have the values dbpprop:iso3166code
, etc. I.e., instead of:
{ ?country dbpprop:iso3166code ?code . }
UNION
{ ?country dbpprop:iso31661Alpha ?code . }
UNION
{ ?country dbpprop:countryCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
you can use:
values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
{ ?country ?hasCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
You can do a similar thing with the ?population
retrieval:
OPTIONAL {?country dbpprop:populationEstimate ?population}
OPTIONAL {?country dbpprop:populationCensus ?population}
can become:
values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
OPTIONAL { ?country ?hasPopulation ?population }
The final result
The rewritten query is now:
SELECT DISTINCT ?name ?population
WHERE {
?country a dbpedia-owl:Country .
?country rdfs:label ?name .
FILTER (langMatches(lang(?name), "en"))
values ?hasPopulation { dbpprop:populationEstimate dbpprop:populationCensus }
OPTIONAL { ?country ?hasPopulation ?population }
FILTER (isNumeric(?population))
FILTER NOT EXISTS { ?country dbpedia-owl:dissolutionYear ?yearEnd }
values ?hasCode { dbpprop:iso3166code dbpprop:iso31661Alpha dbpprop:countryCode }
{ ?country ?hasCode ?code . }
UNION
{ ?country a yago:MemberStatesOfTheUnitedNations . }
}
SPARQL results
India now appears in the results with a population:
"India"@en 1210193422