4

I have the following SQL table that keeps track of a user's score at a particular timepoint. A user can have multiple scores per day.

+-------+------------+-------+-----+
| user  | date       | score | ... |
+-------+------------+-------+-----+
| bob   | 2014-04-19 | 100   | ... |
| mary  | 2014-04-19 | 100   | ... |
| alice | 2014-04-20 | 100   | ... |
| bob   | 2014-04-20 | 110   | ... |
| bob   | 2014-04-20 | 125   | ... |
| mary  | 2014-04-20 | 105   | ... |
| bob   | 2014-04-21 | 115   | ... |
+-------+------------+-------+-----+

Given a particular user (let's say bob), How would I generate a report of each user's score, but only use the highest submitted score per day? (Getting the specific row with the highest score is important as well, not just the highest score)

SELECT * FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`

is the base query that I'm building off of. It results in the following result set:

+-------+------------+-------+-----+
| user  | date       | score | ... |
+-------+------------+-------+-----+
| bob   | 2014-04-19 | 100   | ... |
| bob   | 2014-04-20 | 110   | ... |
| bob   | 2014-04-21 | 115   | ... |
+-------+------------+-------+-----+

bob's higher score of 125 from 2014-04-20 is missing. I tried rectifying that with MAX(score)

SELECT *, MAX(score)  FROM `user_score` WHERE `user` = 'bob' GROUP BY `date`

returns the highest score for the day, but not the row that has the highest score. Other column values on that row are important,

+-------+------------+-------+-----+------------+
| user  | date       | score | ... | max(score) |
+-------+------------+-------+-----+------------+
| bob   | 2014-04-19 | 100   | ... | 100        |
| bob   | 2014-04-20 | 110   | ... | 125        |
| bob   | 2014-04-21 | 115   | ... | 110        |
+-------+------------+-------+-----+------------+

Lastly, I tried

SELECT *, MAX(score)  FROM `user_score` WHERE `user` = 'bob' AND score = MAX(score) GROUP BY `date`

But that results in an invalid use of GROUP BY.

EDIT:

SQLFiddle: http://sqlfiddle.com/#!2/ee6a2

johnluetke
  • 3,393
  • 1
  • 20
  • 25
  • Do you have a sql fiddle for this by chance? – James Apr 21 '14 at 19:35
  • No, I simplified my actual dataset for the question. I'll throw a fiddle together and edit the OP with it – johnluetke Apr 21 '14 at 19:36
  • possible duplicate of [Get records with highest/smallest per group](http://stackoverflow.com/questions/8748986/get-records-with-highest-smallest-whatever-per-group) – Marcus Adams Apr 21 '14 at 19:52

3 Answers3

7

If you want all the fields, the easiest (and fastest) way in MySQL is to use not exists:

SELECT *
FROM `user_score` us
WHERE `user` = 'bob' AND
      NOT EXISTS (SELECT 1
                  FROM user_score us2
                  WHERE us2.`user` = us.`user` AND
                        us2.date = us.date AND
                        us2.score > us.score
                 );

This may seem like a strange approach. And, I'll admit that it is. What it is doing is pretty simple: "Get me all rows for Bob from user_score where there is no higher score (for Bob)". That is equivalent to getting the row with the maximum score. With an index on user_score(name, score), this is probably the most efficient way to do what you want.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    +1, nice one. (though you should change the `name` column with `user` and also add `us2.date = us.date`) – Lamak Apr 21 '14 at 19:39
  • This is great, but it only returns the highest score overall, not the ones for each day – johnluetke Apr 21 '14 at 19:44
  • 1
    @johnluetke That's because you need to add `us2.date = us.date` to the condition – Lamak Apr 21 '14 at 19:45
  • 1
    @Gordon Linoff i have seen your answers for such type of questions in which OP wants max row per group etc,your answers to such questions are mostly based on using exists approach so i want to know how this approach is efficient than join,also +1 for the effort – M Khalid Junaid Apr 21 '14 at 19:45
  • @Lamak . . . I was thinking "name is a better identifier than 'user' because I can't remember if `user` is a reserved word". I convinced myself. I also fixed the date issue. – Gordon Linoff Apr 21 '14 at 19:46
  • @Lamak Oh, derp. Thank you. I'll run this against my data and see if it behaves correctly. – johnluetke Apr 21 '14 at 19:47
3

You can use a JOIN:

SELECT a.*
FROM `user_score` as a
INNER JOIN (SELECT `user`, `date`, MAX(score) MaxScore
            FROM `user_score` 
            GROUP BY `user`, `date`) as b
    ON a.`user` = b.`user`
    AND a.`date` = b.`date`
    AND a.score = b.MaxScore
WHERE a.`user` = 'bob' 
Lamak
  • 69,480
  • 12
  • 108
  • 116
0

One option is to use an inline view and a JOIN operation. If there is more than one row with the "high score" value for a given day, this query will return all the rows. (If (user,date,score) is unique, then this isn't a problem.)

For example:

SELECT t.user
     , t.date
     , t.score
     , t.`...`
  FROM user_score t
  JOIN ( SELECT d.user
              , d.date
              , MAX(s.score) AS score
           FROM user_score d 
          WHERE d.user = 'bob'
          GROUP BY d.user, d.date
       ) s
    ON s.user  = t.user
   AND s.date  = t.date
   AND s.score = t.score
spencer7593
  • 106,611
  • 15
  • 112
  • 140