MySQL - Control which row is returned by a group by

Question

I have a database table like this:

id    version_id    field1    field2
1     1             texta      text1
1     2             textb      text2
2     1             textc      text3
2     2             textd      text4
2     3             texte      text5

If you didn't work it out, it contains a number of versions of a row, and then some text data.

I want to query it and return the version with the highest number for each id. (so the second and last rows only in the above).

I've tried using group by whilst ordering by version_id DESC - but it seems to order after its grouped, so this doesn't work.

Anyone got any ideas? I can't believe it can't be done!

UPDATE:

Come up with this, which works, but uses a subquery:

SELECT *
FROM (SELECT * FROM table ORDER BY version_id DESC) t1
GROUP BY t1.id

Related: http://stackoverflow.com/questions/13700456/mysql-group-by-implementation-details-which-row-mysql-chooses-in-a-group-by-qu — ripper234, Dec 04 '12 at 10:12

ʞɔıu · Accepted Answer · 2009-02-11T15:29:04.983

60

It's called selecting the group-wise maximum of a column. Here are several different approaches for mysql.

Here's how I would do it:

SELECT *
FROM (SELECT id, max(version_id) as version_id FROM table GROUP BY id) t1
INNER JOIN table t2 on t2.id=t1.id and t1.version_id=t2.version_id

This will be relatively efficient, though mysql will create a temporary table in memory for the subquery. I assume you already have an index on (id, version_id) for this table.

It's a deficiency in SQL that you more or less have to use a subquery for this type of problem (semi-joins are another example).

Subqueries are not well optimized in mysql but uncorrelated subqueries aren't so bad as long as they aren't so enormous that they will get written to disk rather than memory. Given that in this query only has two ints the subquery could be millions of rows long before that happened but the select * subquery in your first query could suffer from this problem much sooner.

edited Feb 11 '09 at 15:29

answered Feb 11 '09 at 15:09

ʞɔıu

47,148
35
106
149

3

The linked article is very nice! – onnodb Feb 11 '09 at 15:39
Ta for the link, and example. We ended going with the subquery option. – benlumley Feb 11 '09 at 22:26
2

@ʞɔıu, Is it possible to use `having` to control the rows returned by `group by`? – Pacerier Apr 08 '15 at 10:27
Thanks, a composite index on (id, version_id) is important for performance. – konyak Aug 24 '15 at 21:21
*the group-wise **maximum** of a column*... any chances to select without *maximums* or *minimums*? For example by `id` that is nether `max` or `min`, but just a number or even a hash string – Green Sep 26 '17 at 11:09

score 4 · Answer 2 · edited Nov 19 '12 at 08:28

4

I think this would do it, not sure if it is the best or fastest though.

SELECT * FROM table 
WHERE (id, version_id) IN 
  (SELECT id, MAX(version_id) FROM table GROUP BY id)

edited Nov 19 '12 at 08:28

Ikke

99,403
23
97
120

answered Feb 11 '09 at 15:09

Chris J

2,160
4
16
23

I might be wrong, but it seems to me as subquery will be executed for every row and then tested (though there may be some caching) ... Is that true? – Buksy Aug 26 '16 at 10:31
That is extremely unlikely, what it would most likely do is create a temporary table with the results of the subquery and compare to those. I can't speak for all database systems though. – Chris J Sep 06 '16 at 21:11

score 2 · Answer 3 · answered Feb 11 '09 at 15:29

SELECT id, version_id, field1, field2
FROM (
    SELECT @prev = id AS st, (@prev := id), m.*
    FROM (
           (SELECT @prev := NULL) p,
           (
            SELECT *
            FROM   mytable
            ORDER BY
                   id DESC, version_id DESC
           ) m
     ) m2
WHERE NOT IFNULL(st, FALSE);

No subqueries, one pass on UNIQUE INDEX ON MYTABLE (id, version_id) if you have one (which I think you should)

score 1 · Answer 4 · answered Jul 23 '12 at 08:48

1

This query will do the job without a group by:

SELECT * FROM table AS t
LEFT JOIN table AS t2 
    ON t.id=t2.id 
    AND t.version_id < t2.version_id
WHERE t2.id IS NULL

It does not need any temporary tables.

answered Jul 23 '12 at 08:48

Patrick Savalle

4,068
3
22
24

score 1 · Answer 5 · answered Jul 30 '18 at 13:31

One can always go for analytical functions as well which will give you more control

select tmp.* from ( select id,version_id,field1,field2, rank() over(partition by id order by version_id desc ) as rnk from table) tmp where tmp.rnk=1

If you face issue with rank() function depending on the type of data then one can choose from row_number() or dense_rank() too.

score 0 · Answer 6 · edited Feb 11 '09 at 16:30

0

This is pseudo code but something like this should work just fine

select *
from table
inner join
(
    select id , max(version_id) maxVersion
    from table 
) dvtbl ON id = dvtbl.id && versionid = dvtbl.maxVersion

edited Feb 11 '09 at 16:30

ConroyP

40,958
16
80
86

answered Feb 11 '09 at 15:10

Chris Meek

5,720
9
35
44

score 0 · Answer 7 · answered Feb 11 '09 at 15:10

0

I usually do this with a subquery:

select id, version_id, field1, field2 from datatable as dt where id = (select id from datatable where id = dt.id order by version_id desc limit 1)

answered Feb 11 '09 at 15:10

mcassano

506
4
9

Berek Bryan · Answer 8 · 2009-02-12T12:37:52.730

-1

I think this is what you want.

select id, max(v_id), field1, field2 from table group by id

The results I get from that are

1, 2, textb, text2

2, 3, texte, text5

Edit: I recreated the table and insert the same data with the id an version_id being a compound primary key. This gave the answer I provided earlier. It was also in MySQL.

edited Feb 12 '09 at 12:37

answered Feb 11 '09 at 15:18

Berek Bryan

13,755
10
32
43

that doesn't work, hence the question - the group by will return whichever row happens to be the first in the group, together with the max for version in the group. – benlumley Feb 11 '09 at 22:27
no this does work...I recreated the table he had then ran the query and it worked just fine. – Berek Bryan Feb 12 '09 at 02:06
@Berek: 'fraid not. "[The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. \[..\] In some cases, you can use MIN() and MAX() to obtain a specific **column** value even if it is not unique.](http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html)" – Lightness Races in Orbit Aug 16 '11 at 12:04
Same reason as @benlumley. It will return the id of the first row along with the value for max(v_id) – Yash Apr 11 '16 at 14:49

score -3 · Answer 9 · answered May 08 '09 at 23:13

-3

not tested it but something like this might work:

SELECT * FROM table GROUP BY id ORDER BY MAX(version_id) DESC

answered May 08 '09 at 23:13

MySQL - Control which row is returned by a group by

9 Answers9

Linked