1

In my triple store i've a collection of schema:CreativeWork which has the property schema:version and schema:dateCreated. Now i want to get all schema:CreativeWork but only the newest ones. My sample:

PREFIX schema: <https://schema.org/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT *
WHERE { 
    ?subject rdf:type schema:CreativeWork .     
    ?subject schema:identifier ?identifier .
    ?subject schema:version ?version .
    ?subject schema:dateCreated ?dateCreated .
    OPTIONAL {?subject schema:about/schema:name ?name .}
    FILTER( ?identifier = "46d8b7abfec44865a567ea04e385661b" ) .
} LIMIT 10

How do i manage to query only the latest version?

executable sample: https://api.triplydb.com/s/rLq4V-JgS

Note: FILTER( ?identifier = "46d8b7abfec44865a567ea04e385661b" ) . is just to make it easier.

naturzukunft
  • 79
  • 1
  • 8
  • The [sample](https://api.triplydb.com/s/n6qyvIZXa) with a filter as descibed in [SPARQL query to get only results with the most recent date](https://stackoverflow.com/questions/36181713/sparql-query-to-get-only-results-with-the-most-recent-date) – naturzukunft Apr 30 '22 at 16:48
  • a [sample](https://api.triplydb.com/s/cQ6W-Y7Kb) that gets the max version by identifier – naturzukunft Apr 30 '22 at 18:11
  • is your comment the answer to your question? – UninformedUser May 01 '22 at 06:05
  • no, it isn't. just some hints. i still did not have the solution. – naturzukunft May 01 '22 at 09:15
  • is this your own dataset? Why are the literals all strings? This makes it really difficult to use an efficient filter. While I understand, that depending on the version scheme an integer might not be feasible, for date literals not using the appropriate datatype makes the data somewhat less expressive and inconvenient to use. – UninformedUser May 01 '22 at 12:12
  • in theory, adding `FILTER NOT EXISTS {?subject2 schema:identifier ?identifier . ?subject2 schema:version ?dateCreated2 FILTER(?dateCreated2 > ?dateCreated)}` would just return the one with the latest creation date, it doesn't work on your data because string comparison is lexicographically. It also leads to a timeout – UninformedUser May 01 '22 at 12:13
  • the same resource is identified by the identifier, right? – UninformedUser May 01 '22 at 12:17
  • 1
    here is something that doesn't timeout: `SELECT DISTINCT * { {SELECT ?identifier (max(?dateCreated) as ?latestDate) WHERE { ?subject rdf:type schema:CreativeWork . ?subject schema:identifier ?identifier . ?subject schema:dateCreated ?dateCreated . } group by ?identifier} ?subject schema:identifier ?identifier . ?subject schema:version ?version . ?subject schema:dateCreated ?latestDate . OPTIONAL {?subject schema:about/schema:name ?name .} } LIMIT 100` – UninformedUser May 01 '22 at 12:22
  • - the idea is to use a subquery to get the latest date per identifier, then in the outer query get the resource data with that latest date – UninformedUser May 01 '22 at 12:25
  • > is this your own dataset? yes. > Why are the literals all strings? it's a bug I'll change that and came back to the thread. i hope the query above is still working. For me it looks like it will. Or is there a more performant way to do it, if i use dates? – naturzukunft May 01 '22 at 17:26
  • Thanks, works well! I've changed now also the datatypes! But i've performance/memory problems filling and querying the database. That is what i've to manage now. – naturzukunft May 03 '22 at 05:41

1 Answers1

0

The query of UninformedUser is working well:

PREFIX schema: <https://schema.org/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT * 
{
  {
    SELECT ?identifier (max(?dateCreated) as ?latestDate) 
    WHERE {     
      ?subject rdf:type schema:CreativeWork .       
      ?subject schema:identifier ?identifier .     
      ?subject schema:dateCreated ?dateCreated . 
    } group by ?identifier
  }     
  ?subject schema:identifier ?identifier .     
  ?subject schema:version ?version .    
  ?subject schema:dateCreated ?latestDate .     
  OPTIONAL {?subject schema:about/schema:name ?name . } 
} LIMIT 100
naturzukunft
  • 79
  • 1
  • 8