Get records with max value for each group of grouped SQL results

Question

How do you get the rows that contain the max value for each grouped set?

I've seen some overly-complicated variations on this question, and none with a good answer. I've tried to put together the simplest possible example:

Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)

Person | Group | Age
---
Bob  | 1     | 32  
Jill | 1     | 34  
Shawn| 1     | 42  
Jake | 2     | 29  
Paul | 2     | 36  
Laura| 2     | 39

Desired result set:

Shawn | 1     | 42    
Laura | 2     | 39

_Caution: The Accepted Answer worked in 2012 when it was written. However, it no longer works for multiple reasons, as given in the Comments._ — Rick James, Jan 11 '20 at 18:29
@RickJames - Found a solution on your page here: http://mysql.rjweb.org/doc.php/groupwise_max#using_variables. 'Using "windowing functions"' for MySQL 8+. Thank you! — kJamesy, Aug 10 '21 at 20:00
@kJamesy - Yes, but this is the pointer directly to "windowing functions" for that use: http://mysql.rjweb.org/doc.php/groupwise_max#using_windowing_functions_ — Rick James, Aug 10 '21 at 20:05

score 425 · Answer 1 · edited Dec 06 '22 at 16:55

425

The correct solution is:

SELECT o.*
FROM `Persons` o                    # 'o' from 'oldest person in group'
  LEFT JOIN `Persons` b             # 'b' from 'bigger age'
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL                 # bigger age not found

How it works:

It matches each row from o with all the rows from b having the same value in column Group and a bigger value in column Age. Any row from o not having the maximum value of its group in column Age will match one or more rows from b.

The LEFT JOIN makes it match the oldest person in group (including the persons that are alone in their group) with a row full of NULLs from b ('no biggest age in the group').
Using INNER JOIN makes these rows not matching and they are ignored.

The WHERE clause keeps only the rows having NULLs in the fields extracted from b. They are the oldest persons from each group.

Version 5.7 update:

Since version 5.7, the sql-mode setting includes ONLY_FULL_GROUP_BY by default, so to make this work you must not have this option (edit the option file for the server to remove this setting).

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 24 '12 at 01:55

Bohemian

412,405
93
575
722

Pretty cool Boh, you get the green- when you say 'allowed to not aggregate non-group by columns', are you saying that that's how MySQL behaves by default? how does that differ from other RDBMS? – Yarin Aug 24 '12 at 02:03
2

@Yarin Most other RDBMS would not permit you to `GROUP BY Group` in this case since other columns are present in the `SELECT` list. – Michael Berkowski Aug 24 '12 at 02:05
1

Should add in `order by Group, age desc,person` as well to accommodate this `A tie within a group should give the first alphabetical result` – sel Aug 24 '12 at 02:17
@sel You can if you want, but the important thing is you always get exactly one row per group. Maybe you don't care which row you get, in which case leave it as is. – Bohemian Aug 24 '12 at 02:38
@Yarin Ah! Didn't notice that. I've edited the answer to also order by `Person`, which will give you want you want. – Bohemian Aug 24 '12 at 02:43
@Bohemian- I've stepped this question up a notch with a [new one](http://stackoverflow.com/q/12113699/165673) - would love for you to try and tackle it – Yarin Aug 24 '12 at 17:02
92

*"mysql just returns the first row."* - maybe this is how it works but it is not guaranteed. The [documentation](http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html) says: **"The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate."**. The server doesn't select rows but values (not necessarily from the same row) for each column or expression that appears in the `SELECT` clause and is not computed using an aggregate function. – axiac Jan 22 '15 at 13:26
18

This behaviour changed on [MySQL 5.7.5](http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html) and by default, it rejects this query because the columns in the `SELECT` clause are not functionally dependent on the `GROUP BY` columns. If it is configured to accept it (` ONLY_FULL_GROUP_BY` is disabled), it works like the previous versions (i.e. the values of those columns are indeterminate). – axiac Jan 22 '15 at 13:28
Thanx for this. basically, whenever we select an attribute from a group sql query, it gets the value from the first record, so we need to order it according to our need before we grou by. – Rajesh Paul Mar 07 '15 at 13:31
1

After struggling with this code... which doesn't work at all in MariaDB, I ran across something more useful GROUP_CONCAT. Which allows you to order for the max based on a different column and the concatenate then rest. Very useful if you're trying to get the heaviest Item in an orders table listed first. – Ray Foss Mar 13 '15 at 14:18
31

I am surprised this answer got so many upvotes. It is wrong and it is bad. This query is not guaranteed to work. Data in a subquery is an unordered set in spite of the order by clause. MySQL *may* really order the records now and keep that order, but it woudn't break any rule if it stopped doing so in some future version. Then the `GROUP BY` condenses to one record, but all fields will be arbitrarily picked from the records. It *may* be that MySQL currently simply always picks the first row, but it could just as well pick any other row or even values from *different* rows in a future version. – Thorsten Kettner Oct 12 '16 at 10:09
1

@thorsten it's not "wrong" because **it works**. And inner queries do retain order. Best practice and [YAGNI](http://c2.com/cgi/wiki?YouArentGonnaNeedIt) say "do the simplest thing that works *now*". *If* (maybe never) a future version of MySQL behaves unfavourably to this query, you'll find out during the full system regression test, which always happens when planning something like a database upgrade. If it's discovered things have changed, rewrite the query accordingly. Until then, get on with something else that needs doing. Show me a test case that fails. – Bohemian Oct 12 '16 at 11:44
13

Okay, we disagree here. I don't use undocumented features that just happen to work currently and rely on some tests that will hopefully cover this. You know that you are just lucky that the current implementation gets you the complete first record where the docs clearly state that you might got any indeterminate values instead, but you still use it. Some simple session or database setting may change this anytime. I'd consider this too risky. – Thorsten Kettner Oct 12 '16 at 12:01
3

@Bohemian Your YAGNI link details what it applies to by saying: "Even if you're totally, totally, totally sure that you'll need a feature later on, don't implement it now." It's talking about features, not about bugs. Saying that a test case is necessary to prove that it could fail is like saying your house is safe because you can't cause it to flood. Not everyone is in a situation where they can easily do a "full system regression" every time some security issue requires something to be upgraded. – cesoid Aug 26 '17 at 17:07
@cesoid YAGNI applies here, because there's no evidence that this behavior will ever change - don't worry about something that will probably never happen. And this *is* a very simple query to test, even as a unit test if your dev env has a mysql instance available (again simple). – Bohemian Aug 26 '17 at 22:31
9

This answer seems wrong. Per the [doc](https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html), *the server is free to choose any value from each group ... Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Result set sorting occurs after values have been chosen, and ORDER BY does not affect which value within each group the server chooses.* – Tgr Feb 05 '18 at 08:27
Nevermind, that refers to an ORDER BY on the same level, not a subquery. In any case this does not yield the correct result in MariaDB (which is a shame as it's several magnitudes faster than a self-join). – Tgr Feb 05 '18 at 08:36
2

In MySQL 8, possibly even before, the optimizer simply gets rid of the subquery entirely. It's also [documented](https://dev.mysql.com/doc/refman/8.0/en/derived-table-optimization.html) that an ORDER BY clause of a derived table is simply ignored, if the enclosing query has a GROUP BY clause. – Ilja Everilä Jul 15 '18 at 13:12
2

Same thing, MySQL 5.7: https://www.db-fiddle.com/f/6NwTqvoaTAUjofL34Rz3Ld/3. The ORDER BY is ignored. – Ilja Everilä Jul 15 '18 at 15:22
How about select a group or groups (when multiple groups have exactly the same values) that has/have all values greater than other groups? – Mohammad Afrashteh Sep 03 '18 at 20:44
@MohammadAfrashteh that’s a different question. Feel free to ask a new question. – Bohemian Sep 03 '18 at 20:56
1

I got wrong data. After group data, the result is now show the max row even I order the data desc. – Kenneth Chan Oct 22 '19 at 04:10
It does not work, just gives the first row in the group. I've almost ruined all my data. This answer should be removed. – Michael Jan 06 '21 at 22:40
@MichaelO. Returning the first row is what it’s *supposed* to do. The inner query in this answer returns all rows in order of group then age descending, so the first row of each group is the oldest person. If your use case isn’t similar to this, don’t use this solution and don’t blame the code (which works) for your decision to use it. – Bohemian Jan 06 '21 at 23:15
@Bohemian It returns the first row *without* reverse ordering. – Michael Jan 07 '21 at 01:16
@MichaelO. Are you ordering the *inner* query? ie `select ... from (select ... order by ...)`? – Bohemian Jan 07 '21 at 01:27
@Bohemian I've run it in `db-fiddle` with MySQL version set to 5.6, and it works there, but not in my 5.7 with `ONLY_FULL_GROUP_BY` disabled. – Michael Jan 07 '21 at 01:44
@AndrewKinFatChoi this answer absolutely *is* correct, but you must read the whole answer, especially the last part which states that [`ONLY_FULL_GROUP_BY`](https://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by) must *not* be set - see [this question](https://stackoverflow.com/q/23921117/256196) for how to do that. – Bohemian Jan 21 '22 at 17:01

score 85 · Answer 3 · edited Aug 06 '18 at 02:20

85

You can join against a subquery that pulls the MAX(Group) and Age. This method is portable across most RDBMS.

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT `Group`, MAX(Age) AS max_age
    FROM yourTable
    GROUP BY `Group`
) t2
    ON t1.`Group` = t2.`Group` AND t1.Age = t2.max_age;

edited Aug 06 '18 at 02:20

Tim Biegeleisen

502,043
27
286
360

answered Aug 24 '12 at 01:39

Michael Berkowski

267,341
46
444
390

Michael, thanks for this- but do you have an answer for the issue of returning multiple rows on ties, per Bohemian's comments? – Yarin Aug 24 '12 at 02:08
3

@Yarin If there were 2 rows for example where `Group = 2, Age = 20`, the subquery would return one of them, but the join `ON` clause would match _both_ of them, so you would get 2 rows back with the same group/age though different vals for the other columns, rather than one. – Michael Berkowski Aug 24 '12 at 02:18
So are we saying it's impossible to limit results to one per group unless we go Bohemians MySQL-only route? – Yarin Aug 24 '12 at 02:45
@Yarin no not impossible, just requires more work if there are additional columns - possibly another nested subquery to pull the max associated id for each like pair of group/age, then join against that to get the rest of the row based on id. – Michael Berkowski Aug 24 '12 at 02:49
This should be the accepted answer (the currently accepted answer will fail on most other RDBMS, and in fact would even fail on many versions of MySQL). – Tim Biegeleisen Aug 06 '18 at 02:21
How about select a group or groups (when multiple groups have exactly the same values) that has/have all values greater than other groups? – Mohammad Afrashteh Sep 03 '18 at 20:44
@MohammadAfrashteh That is too complex a set of requirements to describe in comments (and too far removed from the original post). You should post that as a full question on its own, with sample input rows and a sample of the expected query output. – Michael Berkowski Sep 03 '18 at 23:20
Good, fast, but risks returning multiple rows. ("Fast" assumes `INDEX(group, age)` exists.) – Rick James Jan 11 '20 at 18:32
To "pick one of the dups": http://mysql.rjweb.org/doc.php/groupwise_max#using_variables – Rick James Jan 11 '20 at 18:36
only if you append ``GROUP BY t1.group`` at the very end, it works – Jörg Aug 14 '20 at 12:19

Igor Kulagin · Answer 4 · 2022-11-11T16:22:55.387

33

In PostgreSQL you can use DISTINCT ON clause:

SELECT DISTINCT ON ("group") * FROM "mytable" ORDER BY "group", "age" DESC;

edited Nov 11 '22 at 16:22

answered Dec 15 '14 at 04:00

Igor Kulagin

1,701
15
20

@Bohemian sorry, I get it know, this is MySQL-only as it includes non-aggregated columns – Cec Jan 12 '15 at 10:10
2

@IgorKulagin - Doesn't work in Postgres- Error message: *column "mytable.id" must appear in the GROUP BY clause or be used in an aggregate function* – Yarin Jan 27 '15 at 00:21
25

The MySQL query may only work by accident on many occasions. The "SELECT *" may return information that does not correspond to the belonging MAX(age). This answer is wrong. This is probably also the case for SQLite. – Albert Hendriks May 11 '16 at 06:51
**DISTINCT** rocks! Very helpfull – Alexander Jun 23 '16 at 16:27
I'm looking for MySQL. Your solution seems simple and faster than axiac's solution. – Ram Babu Dec 04 '16 at 22:26
2

But this fits the case where we need to select the grouped column and the max column. This does not fits the above requirement where it would results ('Bob', 1, 42) but the expected result is ('Shawn', 1, 42) – Ram Babu Dec 04 '16 at 23:08
In official documentation there is a note, that this behaviour is valid only for built in MIN and MAX aggregate functions. I've tested this solution. It works for sqlite3, but not for sqlite2. Yes, i know, that sqlite2 is very old version, but it stil exists! – porfirion Jan 11 '17 at 15:13
The question is about MYSQL so this should be the accepted answer imo – Adam Apr 30 '18 at 17:36
1

Good for postgres – Karol Gasienica Jan 03 '19 at 10:39
2

This is a wrong answer as mysql "randomly" chooses values from columns that are not GROUP or AGE. This is fine only when you need only these columns. – erdomester Nov 04 '20 at 18:05
How does `ORDER BY "group",` help? What would change if just ordered by `"age" DESC;` – epox Aug 11 '21 at 09:31
@porfirion I may be in the minority, but I really *love* how using a single aggregate function in a group brings along the rest of the same row for the ride. It makes so much sense to me I always wonder why no other DB seems to want to take this approach. – Michael Mar 16 '23 at 18:58

score 8 · Answer 5 · answered Dec 10 '15 at 21:56

Not sure if MySQL has row_number function. If so you can use it to get the desired result. On SQL Server you can do something similar to:

CREATE TABLE p
(
 person NVARCHAR(10),
 gp INT,
 age INT
);
GO
INSERT  INTO p
VALUES  ('Bob', 1, 32);
INSERT  INTO p
VALUES  ('Jill', 1, 34);
INSERT  INTO p
VALUES  ('Shawn', 1, 42);
INSERT  INTO p
VALUES  ('Jake', 2, 29);
INSERT  INTO p
VALUES  ('Paul', 2, 36);
INSERT  INTO p
VALUES  ('Laura', 2, 39);
GO

SELECT  t.person, t.gp, t.age
FROM    (
         SELECT *,
                ROW_NUMBER() OVER (PARTITION BY gp ORDER BY age DESC) row
         FROM   p
        ) t
WHERE   t.row = 1;

2

It does, since 8.0. – Ilja Everilä Jul 15 '18 at 16:38

score 4 · Answer 6 · edited Feb 20 '21 at 18:15

4

Using ranking method.

SELECT @rn :=  CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END AS rn,  
   @prev_grp :=groupa,
   person,age,groupa  
FROM   users,(SELECT @rn := 0) r        
HAVING rn=1
ORDER  BY groupa,age DESC,person

This sql can be explained as below,

select * from users, (select @rn := 0) r order by groupa, age desc, person
@prev_grp is null
@rn := CASE WHEN @prev_grp <> groupa THEN 1 ELSE @rn+1 END

this is a three operator expression
like this, rn = 1 if prev_grp != groupa else rn=rn+1
having rn=1 filter out the row you need

edited Feb 20 '21 at 18:15

David

3,285
1
37
54

answered Aug 24 '12 at 01:46

sel

4,982
1
16
22

sel - need some explanation - I've never even seen `:=` before - what is that? – Yarin Aug 24 '12 at 01:55
1

:= is assignment operator. You could read more on http://dev.mysql.com/doc/refman/5.0/en/user-variables.html – sel Aug 24 '12 at 02:11
I'll have to dig into this- I think the answer overcomplicates our scenario, but thanks for teaching me something new.. – Yarin Aug 24 '12 at 02:12

score 4 · Answer 7 · edited Sep 04 '21 at 02:33

4

Improving on axiac's solution to avoid selecting multiple rows per group while also allowing for use of indexes

SELECT o.*
FROM `Persons` o 
  LEFT JOIN `Persons` b 
      ON o.Group = b.Group AND o.Age < b.Age
  LEFT JOIN `Persons` c 
      ON o.Group = c.Group AND o.Age = c.Age and o.id < c.id
WHERE b.Age is NULL and c.id is null

edited Sep 04 '21 at 02:33

Giacomo1968

25,759
11
71
103

answered Jan 08 '21 at 14:43

John Muraguri

426
7
6

score 3 · Answer 8 · answered Dec 30 '14 at 23:26

3

I would not use Group as column name since it is reserved word. However following SQL would work.

SELECT a.Person, a.Group, a.Age FROM [TABLE_NAME] a
INNER JOIN 
(
  SELECT `Group`, MAX(Age) AS oldest FROM [TABLE_NAME] 
  GROUP BY `Group`
) b ON a.Group = b.Group AND a.Age = b.oldest

answered Dec 30 '14 at 23:26

Bae Cheol Shin

1,498
1
11
10

Thanks, though this returns multiple records for an age when there is a tie – Yarin Jan 27 '15 at 00:19
@Yarin how would decide which is the correct oldest person? Multiple answers seem to be the rightest answer otherwise use limit and order – Duncan Mar 26 '19 at 16:37

score 3 · Answer 9 · answered Sep 28 '16 at 09:48

3

My solution works only if you need retrieve only one column, however for my needs was the best solution found in terms of performance (it use only one single query!):

SELECT SUBSTRING_INDEX(GROUP_CONCAT(column_x ORDER BY column_y),',',1) AS xyz,
   column_z
FROM table_name
GROUP BY column_z;

It use GROUP_CONCAT in order to create an ordered concat list and then I substring to only the first one.

answered Sep 28 '16 at 09:48

Antonio Giovanazzi

41
4

Can confirm that you can get multiple columns by sorting on the same key inside the group_concat, but need to write a separate group_concat/index/substring for each column. – Rasika Jul 20 '19 at 05:53
Bonus here is that you can add multiple columns to the sort inside the group_concat and it would resolve the ties easily and guarantee only one record per group. Well done on the simple and efficient solution! – Rasika Jul 20 '19 at 05:56

score 2 · Answer 10 · answered Sep 14 '16 at 13:30

axiac's solution is what worked best for me in the end. I had an additional complexity however: a calculated "max value", derived from two columns.

Let's use the same example: I would like the oldest person in each group. If there are people that are equally old, take the tallest person.

I had to perform the left join two times to get this behavior:

SELECT o1.* WHERE
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o1
LEFT JOIN
    (SELECT o.*
    FROM `Persons` o
    LEFT JOIN `Persons` b
    ON o.Group = b.Group AND o.Age < b.Age
    WHERE b.Age is NULL) o2
ON o1.Group = o2.Group AND o1.Height < o2.Height 
WHERE o2.Height is NULL;

Hope this helps! I guess there should be better way to do this though...

score 2 · Answer 11 · edited Dec 23 '19 at 07:05

2

In Oracle below query can give the desired result.

SELECT group,person,Age,
  ROWNUMBER() OVER (PARTITION BY group ORDER BY age desc ,person asc) as rankForEachGroup
  FROM tablename where rankForEachGroup=1

edited Dec 23 '19 at 07:05

slfan

8,950
115
65
78

answered Dec 23 '19 at 06:40

kiruba

129
5

score 1 · Answer 12 · answered Apr 19 '13 at 16:22

Using CTEs - Common Table Expressions:

WITH MyCTE(MaxPKID, SomeColumn1)
AS(
SELECT MAX(a.MyTablePKID) AS MaxPKID, a.SomeColumn1
FROM MyTable1 a
GROUP BY a.SomeColumn1
  )
SELECT b.MyTablePKID, b.SomeColumn1, b.SomeColumn2 MAX(b.NumEstado)
FROM MyTable1 b
INNER JOIN MyCTE c ON c.MaxPKID = b.MyTablePKID
GROUP BY b.MyTablePKID, b.SomeColumn1, b.SomeColumn2

--Note: MyTablePKID is the PrimaryKey of MyTable

score 1 · Answer 13 · edited Aug 27 '14 at 07:38

1

with CTE as 
(select Person, 
[Group], Age, RN= Row_Number() 
over(partition by [Group] 
order by Age desc) 
from yourtable)`


`select Person, Age from CTE where RN = 1`

edited Aug 27 '14 at 07:38

Rajesh

10,318
16
44
64

answered Aug 27 '14 at 07:10

Harshad

21
1

DataScientYst · Answer 14 · 2018-02-28T07:11:26.160

This is how I'm getting the N max rows per group in mysql

SELECT co.id, co.person, co.country
FROM person co
WHERE (
SELECT COUNT(*)
FROM person ci
WHERE  co.country = ci.country AND co.id < ci.id
) < 1
;

how it works:

self join to the table
groups are done by co.country = ci.country
N elements per group are controlled by ) < 1 so for 3 elements - ) < 3
to get max or min depends on: co.id < ci.id
- co.id < ci.id - max
- co.id > ci.id - min

Full example here:

mysql select n max values per group

Ritwik · Answer 15 · 2014-10-27T13:51:02.737

0

You can also try

SELECT * FROM mytable WHERE age IN (SELECT MAX(age) FROM mytable GROUP BY `Group`) ;

edited Oct 27 '14 at 13:51

answered Oct 25 '14 at 19:00

Ritwik

521
7
17

2

Thanks, though this returns multiple records for an age when there is a tie – Yarin Jan 27 '15 at 00:23
1

Also, this query would be incorrect in the case that there is a 39-year-old in group 1. In that case, that person would also be selected, even though the max age in group 1 is higher. – Joshua Richardson May 03 '16 at 23:44

score 0 · Answer 16 · answered Mar 13 '15 at 14:30

This method has the benefit of allowing you to rank by a different column, and not trashing the other data. It's quite useful in a situation where you are trying to list orders with a column for items, listing the heaviest first.

Source: http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat

SELECT person, group,
    GROUP_CONCAT(
        DISTINCT age
        ORDER BY age DESC SEPARATOR ', follow up: '
    )
FROM sql_table
GROUP BY group;

score 0 · Answer 17 · answered Jul 10 '16 at 11:31

0

let the table name be people

select O.*              -- > O for oldest table
from people O , people T
where O.grp = T.grp and 
O.Age = 
(select max(T.age) from people T where O.grp = T.grp
  group by T.grp)
group by O.grp;

answered Jul 10 '16 at 11:31

user3475425

1

score 0 · Answer 18 · edited Dec 10 '16 at 07:39

0

If ID(and all coulmns) is needed from mytable

SELECT
    *
FROM
    mytable
WHERE
    id NOT IN (
        SELECT
            A.id
        FROM
            mytable AS A
        JOIN mytable AS B ON A. GROUP = B. GROUP
        AND A.age < B.age
    )

edited Dec 10 '16 at 07:39

Faisal

4,591
3
40
49

answered Oct 03 '16 at 08:55

mayank kumar

53
6

score 0 · Answer 19 · answered Jan 21 '22 at 16:11

0

SELECT o.*
FROM `Persons` o                   
  LEFT JOIN `Persons` b            
      ON o.Group = b.Group AND o.Age < b.Age
WHERE b.Age is NULL  
group by o.Group

answered Jan 21 '22 at 16:11

Andrew Kin Fat Choi

326
3
8

3

Please explain what your answer is doing. – General Grievance Jan 22 '22 at 07:04

Get records with max value for each group of grouped SQL results

19 Answers19

How it works:

Further readings

Version 5.7 update:

Linked

Related