Sparql - How to constuct RDF by selecting MAX values of objects grouped by properties?

Question

sorry for the irrevelant title, I'm French and I didn't know how to expose my problem. I think the best way to explain it is with an example.

I have some RDF sets like this one:

prefix p: <http://localhost/rdf/>

p:set p:hasTitle p:val1 .
p:set p:hasTitle p:val2 .
p:set p:hasAuthor p:val3 .
p:set p:hasAuthor p:val4 .

p:val1 p:hasValue "Harry Peter" .
p:val1 p:hasScore 0.30 .

p:val2 p:hasValue "Harry Potter" .
p:val2 p:hasScore 0.90 .

p:val3 p:hasValue "J. K. Rowling".
p:val3 p:hasScore 0.90 .

p:val4 p:hasValue "Joanne Rowling" .
p:val4 p:hasScore 0.50 .

I want to construct another graph with a sparql query, with only the values with the best score for each distinct property. In this example, the query is supposed to return this:

prefix p: <http://localhost/rdf/>

p:set p:hasTitle p:val2 .
p:set p:hasAuthor p:val3 .

p:val2 p:hasValue "Harry Potter" .
p:val2 p:hasScore 0.90 .

p:val3 p:hasValue "J. K. Rowling" .
p:val3 p:hasScore 0.90 .

For now I have tried something like this:

PREFIX p:    <http://localhost/rdf/>

CONSTRUCT {
    p:root p:hasSameAsSet ?saSet .
    ?saSet ?prop ?bestVal .
    ?bestVal ?p ?v
} 

WHERE { 
    ?s p:hasSameAsSet ?saSet .
    ?saSet ?prop ?val .
    ?bestVal ?p ?v .
    ?bestVal p:hasQualityScore ?m
    {
        SELECT (MAX(?score) AS ?m)
        WHERE { 
            ?val p:hasQualityScore ?score 
        } GROUP BY ?prop
    }    
}

I'm discovering Sparql and I know I'm missing important things. I hope someone can help me, thank you very much ! If my question isn't clear, I can try to explain it better. Don't worry for your answers, I'm better at reading than writing ;)

First comment: Sub-selects are executed first, so you need the triple pattern `?saSet ?prop ?val .` inside it. And you have to group by `value` too to be able to return it. You need to return `?prop ?val` in the sub-select in addition to the max value. — UninformedUser, Jun 16 '16 at 15:34

score 2 · Answer 1 · answered Jun 16 '16 at 17:02

AKSW's comment is spot on. Your query is very close as is, but the subquery is executed first, so it needs enough information get the grouping right, and you also need to project the variables from within it that will let you do the join with the outer query results.

E.g., a query like this gets you the maximal value for each property:

  select ?prop (max(?val_) as ?val) {
    ?sub ?prop ?val_
  }
  group by ?prop

Then you just need to nest that within an outer query that finds, for each property and max value, the subject that had it:

select ?sub ?prop ?val {
    ?sub ?prop ?val
    { select ?prop (max(?val_) as ?val) {
        ?sub ?prop ?val_
      }
      group by ?prop
    }
}

(Note that there could be multiple values of ?sub that have the maximal value.) Adding any extra information that you need to the outer query and then turning it into a constuct query shouldn't be hard, given what you've already got in your existing query.

Thank you for your useful answer. But I have a problem to implement it with my datas: I have a structure like this {:set :prop: val} . {:val :hasScore :score} so I want to select the max :score group by :prop, not group by :hasScore. I don't find out how to do that... — Boustrophedon, Jun 17 '16 at 11:13

Sparql - How to constuct RDF by selecting MAX values of objects grouped by properties?

1 Answers1

Linked