11

I need to get data about films from DBpedia.

I use SPARQL query as follows on http://dbpedia-live.openlinksw.com/sparql:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

I tried to get films that were released after 01.01.2000. But the engine answers as follows:

Virtuoso 22007 Error DT006: Cannot convert 2009-06-31 to datetime : 
Too many days (31, the month has only 30)

SPARQL query:
define sql:big-data-const 0 
#output-format:text/html
define sql:signal-void-variables 1 define input:default-graph-uri <http://dbpedia.org> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

As far as I can understand there are some errors in data in DBpedia and the engine cannot convert string data to date type in order to compare with the date I set. And the engine breaks the query execution.

So, the question is: is there any way to tell the engine to skip all the erroneous data and return to me all that could be processed?

Ben Companjen
  • 1,417
  • 10
  • 24

3 Answers3

3

You can use COALESCE function in order to define a default date for invalid ones:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released ?released_fixed WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  bind ( coalesce(xsd:datetime(?released), '1000-01-01') as ?released_fixed)
  FILTER(xsd:date(coalesce(xsd:datetime(?released), '1000-01-01')) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

This query provides following SPARQL Results on DbPedia Live Endpoint

The bind construct is only for presenting the fixed dates which are set to '1000-01-01' and stored in the variable ?release_fixed. The bind is not necessary for the query and can be omitted together with ?release_fixed in the SELECT clause

mgraube
  • 380
  • 1
  • 14
  • Is the bind() necessary here (if so why please)? I tried this both with and without the bind(), and still get the same error OP reported. – TextGeek Apr 30 '14 at 16:35
  • 1
    I have enhanced my answer in order to explain the bind(). – mgraube May 05 '14 at 06:44
1

One way is to filter using the datatype, as you can see below:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT ?subject ?label ?released WHERE {
  ?subject rdf:type <http://dbpedia.org/ontology/Film>.
  ?subject rdfs:label ?label.
  ?subject <http://dbpedia.org/ontology/releaseDate> ?released.
  FILTER(datatype(?released) = <http://www.w3.org/2001/XMLSchema#dateTime>)
  FILTER(xsd:date(?released) >= "2000-01-01"^^xsd:date).
} ORDER BY ?released
LIMIT 20

SPARQL results

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Trying to run that query results in an error: `Virtuoso 22023 Error SR066: Unsupported case in CONVERT (incomplete RDF box -> DATE)`. This seems similar to the problem in the question. – Joshua Taylor Jun 20 '13 at 02:43
0

Discarding a result with a date that is off by a day seems silly to me (like Windows doing a bugcheck whenever it feels something is wrong, eg your GPU video adaptor hanging 5 times in a row).

Since you only care about the year, isn't it better to compare string-wise?

str(?released) >= "2000"

XSD says "at least 4 digits for the year" so this works for all positive years (AD). BTW this will also work if the DBpedia extraction framework found only a year in that field.

Vladimir Alexiev
  • 2,477
  • 1
  • 20
  • 31