Joining tables based on the maximum value

Question

Here's a simplified example of what I'm talking about:

Table: students      exam_results
_____________       ____________________________________
| id | name |       | id | student_id | score |   date |
|----+------|       |----+------------+-------+--------|
|  1 | Jim  |       |  1 |          1 |    73 | 8/1/09 | 
|  2 | Joe  |       |  2 |          1 |    67 | 9/2/09 |
|  3 | Jay  |       |  3 |          1 |    93 | 1/3/09 |
|____|______|       |  4 |          2 |    27 | 4/9/09 |
                    |  5 |          2 |    17 | 8/9/09 |
                    |  6 |          3 |   100 | 1/6/09 |
                    |____|____________|_______|________|

Assume, for the sake of this question, that every student has at least one exam result recorded.

How would you select each student along with their highest score? Edit: ...AND the other fields in that record?

Expected output:

_________________________
| name | score |   date |
|------+-------|--------|
|  Jim |    93 | 1/3/09 |
|  Joe |    27 | 4/9/09 |
|  Jay |   100 | 1/6/09 |
|______|_______|________|

Answers using all types of DBMS are welcome.

How would you resolve ties? In your example, which record should be selected in Jim scored 93 twice? — Sparky, Dec 07 '09 at 23:39
in my own tables, (which are nothing to do with students and exams) this doesn't happen. Either one should be fine? — nickf, Dec 07 '09 at 23:45

boydc7 · Accepted Answer · 2009-12-08T01:05:23.720

Answering the EDITED question (i.e. to get associated columns as well).

In Sql Server 2005+, the best approach would be to use a ranking/window function in conjunction with a CTE, like this:

with exam_data as
(
    select  r.student_id, r.score, r.date,
            row_number() over(partition by r.student_id order by r.score desc) as rn
    from    exam_results r
)
select  s.name, d.score, d.date, d.student_id
from    students s
join    exam_data d
on      s.id = d.student_id
where   d.rn = 1;

For an ANSI-SQL compliant solution, a subquery and self-join will work, like this:

select  s.name, r.student_id, r.score, r.date
from    (
            select  r.student_id, max(r.score) as max_score
            from    exam_results r
            group by r.student_id
        ) d
join    exam_results r
on      r.student_id = d.student_id
and     r.score = d.max_score
join    students s
on      s.id = r.student_id;

This last one assumes there aren't duplicate student_id/max_score combinations, if there are and/or you want to plan to de-duplicate them, you'll need to use another subquery to join to with something deterministic to decide which record to pull. For example, assuming you can't have multiple records for a given student with the same date, if you wanted to break a tie based on the most recent max_score, you'd do something like the following:

select  s.name, r3.student_id, r3.score, r3.date, r3.other_column_a, ...
from    (
            select  r2.student_id, r2.score as max_score, max(r2.date) as max_score_max_date
            from    (
                        select  r1.student_id, max(r1.score) as max_score
                        from    exam_results r1
                        group by r1.student_id
                    ) d
            join    exam_results r2
            on      r2.student_id = d.student_id
            and     r2.score = d.max_score
            group by r2.student_id, r2.score
        ) r
join    exam_results r3
on      r3.student_id = r.student_id
and     r3.score = r.max_score
and     r3.date = r.max_score_max_date
join    students s
on      s.id = r3.student_id;

EDIT: Added proper de-duplicating query thanks to Mark's good catch in comments

I don't think the distinct works to de-duplicate ties if the dates are different. — Mark Byers, Dec 08 '09 at 00:00
Good point Mark - would need to use something deterministic in another subquery to de-dupe correctly in the ANSI query. I'll edit to reflect... — boydc7, Dec 08 '09 at 00:07
Invalid column name 'score': I think you got a couple of table names mixed up. — Mark Byers, Dec 08 '09 at 00:21
Yeah, that's what you get when you write a query in a text editor...I just adjusted it, had the filters backwards on the final join conditions — boydc7, Dec 08 '09 at 01:06

score 4 · Answer 2 · edited Sep 25 '15 at 06:17

4

SELECT s.name,
    COALESCE(MAX(er.score), 0) AS high_score
FROM STUDENTS s
    LEFT JOIN EXAM_RESULTS er ON er.student_id = s.id
GROUP BY s.name

edited Sep 25 '15 at 06:17

Aritz

30,971
16
136
217

answered Dec 07 '09 at 23:25

OMG Ponies

325,700
82
523
502

This assumes that there could be students without exams associated. – OMG Ponies Dec 07 '09 at 23:26
ahhh ok, it seems i've asked the question poorly. I'll reword. – nickf Dec 07 '09 at 23:31
have to add a group by clause for this to work, i.e. "group by s.name" – boydc7 Dec 07 '09 at 23:34

score 2 · Answer 3 · edited Dec 07 '09 at 23:41

2

Try this,

Select student.name, max(result.score) As Score from Student
        INNER JOIN
    result
        ON student.ID = result.student_id
GROUP BY
    student.name

edited Dec 07 '09 at 23:41

nickf

537,072
198
649
721

answered Dec 07 '09 at 23:32

Zinx

2,291
3
28
37

thanks Zinx, however I stuffed up the original wording of the question. I actually need to know more than just the high score: I need to know all other fields in the record which holds their high score, too. – nickf Dec 07 '09 at 23:35
Zinx, highlight your code and hit Ctrl+K to format the query syntax, easier reading – boydc7 Dec 07 '09 at 23:36

score 2 · Answer 4 · answered Dec 08 '09 at 05:08

2

With Oracle's analytic functions this is easy:

SELECT DISTINCT
       students.name
      ,FIRST_VALUE(exam_results.score)
       OVER (PARTITION BY students.id
             ORDER BY exam_results.score DESC) AS score
      ,FIRST_VALUE(exam_results.date)
       OVER (PARTITION BY students.id
             ORDER BY exam_results.score DESC) AS date
FROM   students, exam_results
WHERE  students.id = exam_results.student_id;

answered Dec 08 '09 at 05:08

Jeffrey Kemp

59,135
14
106
158

I know PostgreSQL isn't in the question list, but in case anyone stumbles across this, it has [window functions](http://www.postgresql.org/docs/9.3/static/functions-window.html) to do this, too. – jpmc26 Nov 13 '13 at 04:36

score 1 · Answer 5 · edited Jan 18 '12 at 13:47

1

Select Name, T.Score, er. date 
from Students S inner join
          (Select Student_ID,Max(Score) as Score from Exam_Results
           Group by Student_ID) T 
On S.id=T.Student_ID inner join Exam_Result er
On er.Student_ID = T.Student_ID And er.Score=T.Score

edited Jan 18 '12 at 13:47

nawfal

70,104
56
326
368

answered Dec 08 '09 at 05:09

MSH

11
1

Mark Byers · Answer 6 · 2009-12-08T00:15:31.027

Using MS SQL Server:

SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
    SELECT TOP(1) id FROM exam_results T2
    WHERE exam_results.student_id = T2.student_id
    ORDER BY score DESC, date ASC)

If there is a tied score, the oldest date is returned (change date ASC to date DESC to return the most recent instead).

Output:

Jim 93  2009-01-03 00:00:00.000
Joe 27  2009-04-09 00:00:00.000
Jay 100 2009-01-06 00:00:00.000

Test bed:

CREATE TABLE students(id int , name nvarchar(20) );

CREATE TABLE exam_results(id int , student_id int , score int, date datetime);

INSERT INTO students
VALUES
(1,'Jim'),(2,'Joe'),(3,'Jay')

INSERT INTO exam_results VALUES
(1, 1, 73, '8/1/09'), 
(2, 1, 93, '9/2/09'),
(3, 1, 93, '1/3/09'),
(4, 2, 27, '4/9/09'),
(5, 2, 17, '8/9/09'),
(6, 3, 100, '1/6/09')

SELECT name, score, date FROM exam_results
JOIN students ON student_id = students.id
JOIN (SELECT DISTINCT student_id FROM exam_results) T1
ON exam_results.student_id = T1.student_id
WHERE exam_results.id = (
    SELECT TOP(1) id FROM exam_results T2
    WHERE exam_results.student_id = T2.student_id
    ORDER BY score DESC, date ASC)

On MySQL, I think you can change the TOP(1) to a LIMIT 1 at the end of the statement. I have not tested this though.

Joining tables based on the maximum value

6 Answers6

Linked