SPARQL: Figure out high data property values

Question

I have a quiz game in which students have to solve questions from three categories like Chemistry, English, Physics. Students will score points in these categories like student1 has 50 in Chemistry, 70 in English and 65 in Physics.

I can figure out in which category a student has highest score. But how can I get something like which one is the highest score category any student have? I mean if a student got 90 points in English (No other student got this score), then how can we figure out this the top score of English is 90.

Remember: English score, Chemistry score, Physics score are data properties stored in rdf file. I want if it is possible using Jena rules or SPARQL or plain Java code.

SPARQL aggregate functions is the way to go, see https://www.w3.org/TR/sparql11-query/#aggregates — UninformedUser, Jan 04 '17 at 12:40
@AKSW, I dont want to sum the values. I just want to figure out the highest score of the three categories any student has ever scored? I dont know how this can be done with aggregate function — s.shah, Jan 04 '17 at 14:03
I mean if we apply the max function, then with which variable we will assign this max function — s.shah, Jan 04 '17 at 14:12
Obviously with the variable that contains the value?! I don't understand the problem. — UninformedUser, Jan 04 '17 at 15:22
@AKSW It sounds like OP is asking for an argmax, which is actually a bit less trivial. — Joshua Taylor, Jan 04 '17 at 20:21
@JoshuaTaylor the maximum value for a particular property is mostly trivial - for us. `SELECT (max(?score) as ?maxScore) {?s ex:englishScore ?score }` That's what I understood they want to have. And doing this client side for each subject is no magic :D — UninformedUser, Jan 04 '17 at 21:38

Joshua Taylor · Accepted Answer · 2017-01-05T13:42:49.353

If I understand you correctly, you're asking to find the maximum score in each category, and then to find, for each category, the student with that highest score in that category. It's easier to work with data (in the future, please try to provide minimal data that we can work with), so here's some sample data:

@prefix : <urn:ex:>

:student1 :hasScore [ :inCategory :category1 ; :value 90 ] ,
                    [ :inCategory :category2 ; :value 75 ] ,
                    [ :inCategory :category3 ; :value 85 ] .

:student2 :hasScore [ :inCategory :category2 ; :value 75 ] ,
                    [ :inCategory :category3 ; :value 90 ] ,
                    [ :inCategory :category4 ; :value 90 ] .

:student3 :hasScore [ :inCategory :category1 ; :value 85 ] ,
                    [ :inCategory :category2 ; :value 80 ] ,
                    [ :inCategory :category4 ; :value 95 ] .

There are four categories, and student1 has the highest score in category1, student3 has the highest score in categories 2 and 4, and student2 has the highest score in category 3. We can write a query like this:

prefix : <urn:ex:>

select ?category ?student ?highScore where {

  #-- Find the high score in each category
  { select ?category (max(?score) as ?highScore) {
      ?student :hasScore [ :inCategory ?category ; :value ?score ] .
    }
    group by ?category
  }

  #-- Then find the student that had that high
  #-- score in the category.
  ?student :hasScore [ :inCategory ?category ; :value ?highScore ] .
}

--------------------------------------
| category   | student   | highScore |
======================================
| :category1 | :student1 | 90        |
| :category2 | :student3 | 80        |
| :category3 | :student2 | 90        |
| :category4 | :student3 | 95        |
--------------------------------------

If you don't care about which student got the highest score, then you just want that inner subquery:

prefix : <urn:ex:>

select ?category (max(?score) as ?highScore) {
  ?student :hasScore [ :inCategory ?category ; :value ?score ] .
}
group by ?category

--------------------------
| category   | highScore |
==========================
| :category1 | 90        |
| :category2 | 80        |
| :category3 | 90        |
| :category4 | 95        |
--------------------------

If you're using different properties

In a comment, you asked,

I have my ontology like this: Student1 :Englishscore 90; PhyscicsScore 67; ChemScore 78. Similarly for other students. Should I introduce a blank node like hasScore which reference to Englishscore, PhyscicsScore [sic], and ChemScore?

First, I'd recommend that you standardize your naming convention. First, be sure to use correct spelling (e.g., Physics). Then, either abbreviate or don't. You're abbreviating Chemistry to Chem, but not English to Eng. Finally, be consistent in your capitalization (e.g., EnglishScore, not Englishscore).

It's not necessary to use the kind of representation that I used. You didn't provide sample data (please do in the future), so I used what I considered a fairly easy one to use. Your representation seems a bit less flexible, but you can still get the information you want. Here's some new sample data:

@prefix : <urn:ex:>

:student1 :hasCat1Score 90 ;
          :hasCat2Score 75 ;
          :hasCat3Score 85 .

:student2 :hasCat2Score 75 ;
          :hasCat3Score 90 ;
          :hasCat4Score 90 .

:student3 :hasCat1Score 85 ;
          :hasCat2Score 80 ;
          :hasCat4Score 95 .

Then the query just needs to use a variable for the property, and that variable simultaneously relates the student to the score, and also indicates the category. So you'd still just group by that property and ask for the highest score:

prefix : <urn:ex:>

select ?hasScore (max(?score) as ?highScore) {
  ?student ?hasScore ?score
}
group by ?hasScore

-----------------------------
| hasScore      | highScore |
=============================
| :hasCat1Score | 90        |
| :hasCat2Score | 80        |
| :hasCat3Score | 90        |
| :hasCat4Score | 95        |
-----------------------------

I'm always impressed how much effort you put into helping people here! I'm mostly too lazy especially when a group of people has to do homework and isn't able to provide a minimal sample of the data. By the way, they also asked the same question on the Apache Jena mailing list where they got an answer. — UninformedUser, Jan 04 '17 at 21:35
B y the way, it looks like they use a different modeling of the data, thus, there are data properties for each subject - which indeed makes things more complicated and does not scale :D — UninformedUser, Jan 05 '17 at 10:18
@Joshua Taylor, Thanks a lot for your detailed answer. So it means we should use blank nodes in our query. I have my ontology like this: Student1 :Englishscore 90; PhyscicsScore 67; ChemScore 78. Similarly for other students. Should I introduce a blank node like hasScore which reference to Englishscore, PhyscicsScore, and ChemScore? — s.shah, Jan 05 '17 at 12:09
@s.shah No, you don't need to use the same kind of representation that I did; you can continue to use the one you're using. See the update to my answer. — Joshua Taylor, Jan 05 '17 at 13:45
@JoshuaTaylor, once again thanks a lot for your time and cooperation. I really hope it works this time. Regards — s.shah, Jan 05 '17 at 13:51

SPARQL: Figure out high data property values

1 Answers1

If you're using different properties

Linked