1

I am learning SPARQL and dbpedia by working through the queries in https://www.joe0.com/2014/09/22/how-to-use-sparql-to-query-dbpedia-and-freebase/ . I am testing a query to return John Lennon's date of birth and I am running my queries in http://dbpedia.org/sparql . The query is:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"@en.
?x0 dbpedia-owl:birthDate ?x1.
}

It returns two rows containing the same date (9 Oct 1940). My question is: why does the query return two rows even though it uses DISTINCT? Prior to asking this question I checked the following:

but I don't think they explain the duplicate dates.

Edit: I converted the results to text and pasted them below

-------------------------------------- -----------------------------------------------------
x0                                      x1
--------------------------------------- -----------------------------------------------------
http://dbpedia.org/resource/John_Lennon 1940-10-09 
http://dbpedia.org/resource/John_Lennon "1940-10-9"^^<http://www.w3.org/2001/XMLSchema#date>
Ubercoder
  • 711
  • 8
  • 24

4 Answers4

2

As stated it seems dbpedia actually has two dates, 1940-10-09 (valid) and 1940-10-9 (invalid). The answer is to add a FILTER that converts the date to a string and only allows dates conforming to YYYY-MM-DD. Anyway it works!

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?x1 STR(?x1) WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label "John Lennon"@en.
?x0 dbpedia-owl:birthDate ?x1.
FILTER (REGEX(STR(?x1),"[0-9]{4}-[0-9]{2}-[0-9]{2}")).
} 
Ubercoder
  • 711
  • 8
  • 24
1

I ran your query on the DBpedia endpoint and asked for the results in an RDF-based format (Turtle), and found that the lexical forms of the date literals are actually different:

"1940-10-09"^^xsd:date
"1940-10-9"^^xsd:date

The second isn't actually a legal xsd:date. The first is, which is probably why the SPARQL endpoint prints it in "pretty" fashion in the HTML table (as just 1940-10-09).

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • But I don't understand why there should be two date literals - there should only be one John Lennon born on that day. Re the results in RDF form, I don't think the dbpedia endpoint I referenced can do that. – Ubercoder Apr 27 '18 at 10:38
  • My mistake, I am still learning here. How did you make the endpoint show the RDF? – Ubercoder Apr 27 '18 at 10:42
  • Also I just edited my question and added the results of the query. – Ubercoder Apr 27 '18 at 10:42
  • My apologies about the RDF, I changed the dropdown to RDF like you said and it lets me download the resulting RDF file. – Ubercoder Apr 27 '18 at 10:49
  • @JoshuaTaylor single digit in xsd:date is illegal. http://live.dbpedia.org/page/John_Lennon and a live try http://mappings.dbpedia.org/server/extraction/en/extract?title=John_Lennon&format=turtle-triples&revid=744133215 don't have his defect. So it was some bug in the extractor used for that version of DBpedia – Vladimir Alexiev Apr 27 '18 at 12:32
  • Known issue in DBpedia ... I worked around it for ages and always did data cleaning before I loaded the DBpedia dump into a local triple store. Especially for `xsd:date`e literals, there dozens of variations w.r.t. illegal syntax. – UninformedUser Apr 28 '18 at 08:31
1

Well, it is not your fault! Simply the resource has both of these triples as you can see here. There are duplicates in the data.

Erwarth
  • 547
  • 6
  • 18
  • Can I add a filter or condition to the date in ?x1 to make it only return valid dates? In SQL there is an ISDATE function, but I don't know enough SPARQL to do the same thing. PS: I realize SQL and SPARQL are completely different things. – Ubercoder Apr 27 '18 at 13:41
1

The result is a slowdown on queries because each access to an invalid date trig an exception (for example, with a query from fuseki) or the filter do the job to eliminate the wrong date, but it's costly

Moissinac
  • 63
  • 5