SQL select only rows with max value on a column

Question

I have this table for documents (simplified version here):

id	rev	content
1	1	...
2	1	...
1	2	...
1	3	...

How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.

Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?

Yes, and that would pose no problem, I have cut out many columns which I'd be adding back. — Majid Fouladpour, Oct 12 '11 at 19:48
@MarkByers I have edited my answer to comply with OP needs. Since I was at it, I decided to write a more comprehensive answer on the [tag:greatest-n-per-group] topic. — Adriano Carneiro, Oct 12 '11 at 20:57
This is common [tag:greatest-n-per-group] problem, which has well tested and [optimized solutions](http://stackoverflow.com/q/8748986). I prefer the [left join solution by Bill Karwin](http://stackoverflow.com/a/8749095) (the [original post](http://stackoverflow.com/a/123481)). Note that bunch of solutions to this common problem can surprisingly be found in the one of most official sources, **MySQL manual**! See [Examples of Common Queries :: The Rows Holding the Group-wise Maximum of a Certain Column](http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-row.html). — Tomas, Apr 28 '14 at 11:50
duplicate of [Retrieving the last record in each group](http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group) — Tomas, Jul 08 '14 at 18:39
for me `SELECT DISTINCT ON .... ORDER BY "UserId", "Deals".position;` worked better — Muhammad Umer, Mar 07 '18 at 01:35
Take a look at this: https://en.wikipedia.org/wiki/Relational_database "Each row in a table has its own unique key." Any particular reason you are disregarding this part of the spec? — HoldOffHunger, Mar 04 '20 at 15:26
@HoldOffHunger I am not. In the [linked schema](http://sqlfiddle.com/#!9/3d74dc/1) you can see I have a primary key. — Majid Fouladpour, Mar 05 '20 at 10:59
It doesn't seem like you can use anything standard like `AUTO_INCREMENT` with two fields -- https://dba.stackexchange.com/q/35449 . That makes this very much so non-standard; if I can't increment its position, how useful is a unique position? — HoldOffHunger, Mar 05 '20 at 14:40
@HoldOffHunger in the context of this question, can you think of a use case where having this composite key of id and rev becomes a liability and an auto-increment key could save you the trouble? I can't and that to me is all that maters. You see, I don't find 'standards' as compelling as your concerns imply you do. I am of the opinion that specs and standards are there to serve us achieve the results we seek. They are not an obligation. — Majid Fouladpour, Mar 06 '20 at 01:58
"Is it good practice to always have an autoincrement integer primary key?" ~ Answer: "Yes." https://softwareengineering.stackexchange.com/q/328458 Some people disagree, but their reasons certainly are not yours. — HoldOffHunger, Mar 06 '20 at 14:49
Having experiencing SQLite's (in my opinion) *correct* way of doing this *first*, I am constantly bewildered and ticked off that no other DB automatically uses the data from the same row when using an aggregate function. — Michael, Jan 10 '23 at 18:54

score 2519 · Accepted Answer · edited Nov 08 '21 at 12:15

2519

At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

It's never that simple, is it?

I just noticed you need the content column as well.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

It is, actually, so common that Stack Overflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

Basically, you have two approaches to solve that problem:

Joining with simple `group-identifier, max-value-in-group` Sub-query

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

The second join condition is having left side value less than right value
When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

So you end up with:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

Conclusion

Both approaches bring the exact same result.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

edited Nov 08 '21 at 12:15

Nimantha

6,405
6
28
69

answered Oct 12 '11 at 19:43

Adriano Carneiro

57,693
12
90
123

8

This is a really bad idea because the field that you want to maximize might be a double, and comparing doubles for equality is non-deterministic. I think only the O(n^2) algorithm works here. – mk3009hppw Mar 21 '21 at 16:31
@Adriano how would this work if there would be an extra column `user_id` , and you want to limit your results to that `user_id` ? I suppose that filter should happen quite early on to avoid it's joining stuff together from irrelevant `user_id`s that will be ditched later? – binoculars Aug 26 '21 at 16:14
I'm not sure the two approaches will "bring the exact same result": I think the second approach will keep records for which the `rev` field is NULL (they will have no match in the join), but the first approach will not keep them (their rev is not the max so they are not selected). – a3nm Sep 16 '21 at 12:01
7

Another way is by using window functions. They seem to offer better performance. I would do something like: `SELECT DISTINCT id, MAX(rev) OVER (PARTITION BY id), FIRST_VALUE(content) OVER (PARTITION BY id ORDER BY rev DESC) FROM YourTable` – Marcos Dec 07 '21 at 18:37
9

@mk3009hppw: Comparing doubles for equality is _entirely_ deterministic, although the idea that it's somehow not is a common misconception. What people usually mean by it (if they're not just parroting something they heard from elsewhere) is that inexact floating point calculations (which could be as simple as 0.1 + 0.2) may not return exactly the "expected" result (0.3) due to rounding, or that [comparing numeric types with different precision](https://randomascii.wordpress.com/2012/06/26/doubles-are-not-floats-so-dont-compare-them/) may behave unexpectedly. But neither of those happens here. – Ilmari Karonen Dec 08 '21 at 00:15
Both methods are very slow since they are O(n^2), and on top of that they use a join which makes the job of the query optimizer more difficult (since the cost of computing the optimal join order is proportional to the factorial of the number of joins in the query). See below for the answer using window functions for one that is guaranteed to be soft linear time. – saolof Apr 11 '22 at 08:27
What would it do if there were multiple rows with the same value for `rev`? At least for the first approach I'm pretty sure I would get multiple entries for each group (all rows with maximum `rev`). – jederik Sep 02 '22 at 08:59

Kevin Burton · Answer 2 · 2013-12-16T13:08:50.307

390

My preference is to use as little code as possible...

You can do it using IN try this:

SELECT * 
FROM t1 WHERE (id,rev) IN 
( SELECT id, MAX(rev)
  FROM t1
  GROUP BY id
)

to my mind it is less complicated... easier to read and maintain.

edited Dec 16 '13 at 13:08

answered Oct 12 '11 at 19:47

Kevin Burton

11,676
2
24
37

40

Curious - which database engine can we use this type of WHERE clause in? This is not supported in SQL Server. – Kash Nov 17 '11 at 17:04
34

oracle & mysql (not sure about other databases sorry) – Kevin Burton Nov 17 '11 at 18:03
37

Works on PostgreSQL too. – lcguida Jan 15 '14 at 17:43
21

Confirmed working in DB2 – coderatchet Jan 29 '14 at 02:32
28

Does not work with SQLite. – Marcel Pfeiffer Oct 26 '14 at 20:32
1

Doesn't work in H2 – manuna Mar 19 '21 at 14:29
Works on Clickhouse – aryndin Jun 08 '21 at 16:29
Works in spark SQL, also imho clearer to read – Joey Jul 19 '21 at 15:16
4

I just tried this with SQLite 3.27.2 and it seems to work perfectly – Britton Kerin Oct 08 '21 at 20:13
[Progress](https://www.complexsql.com/progress-database/) uses it as well. – ryyker Apr 21 '22 at 17:20
for mssql, you can concat your columns into a string (results may vary): `Where CAST(id as VARCHAR(100)) + CAST(rev as VARCHAR(100)) in...` – highboi May 17 '22 at 22:08
This syntax ( (id,rev) IN )would not work on db2 – cobp Jun 03 '22 at 16:50
1

Perfect in Maria. Nice minimalistic and readable code ! – Michael P Sep 08 '22 at 10:14
doesnt work, it bring wrong results – Albert Abdonor Oct 25 '22 at 10:02
Not supported in SQL Server 2017. – qwr Jun 09 '23 at 16:36

score 230 · Answer 3 · edited Jun 30 '22 at 23:02

230

I am flabbergasted that no answer offered SQL window function solution:

SELECT a.id, a.rev, a.contents
  FROM (SELECT id, rev, contents,
               ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) ranked_order
          FROM YourTable) a
 WHERE a.ranked_order = 1

Added in SQL standard ANSI/ISO Standard SQL:2003 and later extended with ANSI/ISO Standard SQL:2008, window (or windowing) functions are available with all major vendors now. There are more types of rank functions available to deal with a tie issue: RANK, DENSE_RANK, PERSENT_RANK.

edited Jun 30 '22 at 23:02

deweydb

2,238
2
30
37

answered Aug 09 '16 at 15:29

topchef

19,091
9
63
102

6

intuition is tricky thing. I find it more intuitive than other answers as it builds explicit data structure that answers the question. But, again, intuition is the other side of bias... – topchef Jan 10 '17 at 18:22
13

This might work in MariaDB 10.2 and MySQL 8.0.2, but not before. – Rick James Apr 01 '17 at 22:01
3

The approach of window functions should be preferred due to simplicity. – Mark Andersen Mar 30 '21 at 20:01
1

Yes, window function seems to be a better approach. At least it has better performance. I would use MAX and FIRST_VALUE functions, though: SELECT DISTINCT id, MAX(rev) OVER (PARTITION BY id), FIRST_VALUE(content) OVER (PARTITION BY id ORDER BY rev DESC) FROM YourTable – Marcos Dec 07 '21 at 18:35
4

This is the more efficient way of doing this compared to correlated queries (performance killer) or other aggregate functions. This should now be marked as accepted answer. – Paramvir Singh Karwal Feb 01 '22 at 15:00
2

I think you can't use `rank` as the field name on line 3 of your code. at least not in mysql 8.0.29. `rank` is a reserved word, you must use something else like `ranked_order` or whatever you want. – deweydb Jun 30 '22 at 22:35
It's easy to say one solution has better performance, but it always needs to be tested. In my case I got 64% vs 36% in favor of this solution in a dev enviroment, but 12% vs 88% in favor for the accepted answer with group by and row_number in production with more data (that query did however look more complex than using this solution). – Erik Bunnstad May 31 '23 at 09:47
To save you time when you've gotten this low - MariaDB (at least 10.3) says "Error in query (4015): Window function is allowed only in SELECT list and ORDER BY clause" – jave.web Aug 22 '23 at 18:34

score 110 · Answer 4 · answered Jan 23 '14 at 14:16

110

Yet another solution is to use a correlated subquery:

select yt.id, yt.rev, yt.contents
    from YourTable yt
    where rev = 
        (select max(rev) from YourTable st where yt.id=st.id)

Having an index on (id,rev) renders the subquery almost as a simple lookup...

Following are comparisons to the solutions in @AdrianCarneiro's answer (subquery, leftjoin), based on MySQL measurements with InnoDB table of ~1million records, group size being: 1-3.

While for full table scans subquery/leftjoin/correlated timings relate to each other as 6/8/9, when it comes to direct lookups or batch (id in (1,2,3)), subquery is much slower then the others (Due to rerunning the subquery). However I couldnt differentiate between leftjoin and correlated solutions in speed.

One final note, as leftjoin creates n*(n+1)/2 joins in groups, its performance can be heavily affected by the size of groups...

answered Jan 23 '14 at 14:16

Vajk Hermecz

5,413
2
34
25

1

This method is hard to understand. I can't run the subquery independently, since it refers to the outer query. It appears the subquery returns a single value at a time, but, per Wikipedia, "the subquery may be evaluated once for each row processed by the outer query." Within the subquery, yt.id must produce a single value for the row? Ultimately, with each row, the subquery seems to get the max of rev for that id. The idea that a subquery produces different results at different times throughout the query execution seems to take us well under the hood. – Mark E. Sep 19 '21 at 21:31
@dolmen, notice that the inner select is evaluated for every row in the table. You are assuming that it is only evaluated once, and that single value is used. – Vajk Hermecz Oct 20 '21 at 21:01
@VajkHermecz You're right. I deleted my comment. – dolmen Jun 02 '22 at 08:13
Don't like correlated subquery – siggi_pop Dec 08 '22 at 18:20
This is useful for older versions of MySql (max() trick selected the first row, not the row where max value is) – Marco Sulla Dec 28 '22 at 09:43

score 48 · Answer 5 · answered Jun 30 '13 at 06:02

I can't vouch for the performance, but here's a trick inspired by the limitations of Microsoft Excel. It has some good features

GOOD STUFF

It should force return of only one "max record" even if there is a tie (sometimes useful)
It doesn't require a join

APPROACH

It is a little bit ugly and requires that you know something about the range of valid values of the rev column. Let us assume that we know the rev column is a number between 0.00 and 999 including decimals but that there will only ever be two digits to the right of the decimal point (e.g. 34.17 would be a valid value).

The gist of the thing is that you create a single synthetic column by string concatenating/packing the primary comparison field along with the data you want. In this way, you can force SQL's MAX() aggregate function to return all of the data (because it has been packed into a single column). Then you have to unpack the data.

Here's how it looks with the above example, written in SQL

SELECT id, 
       CAST(SUBSTRING(max(packed_col) FROM 2 FOR 6) AS float) as max_rev,
       SUBSTRING(max(packed_col) FROM 11) AS content_for_max_rev 
FROM  (SELECT id, 
       CAST(1000 + rev + .001 as CHAR) || '---' || CAST(content AS char) AS packed_col
       FROM yourtable
      ) 
GROUP BY id

The packing begins by forcing the rev column to be a number of known character length regardless of the value of rev so that for example

3.2 becomes 1003.201
57 becomes 1057.001
923.88 becomes 1923.881

If you do it right, string comparison of two numbers should yield the same "max" as numeric comparison of the two numbers and it's easy to convert back to the original number using the substring function (which is available in one form or another pretty much everywhere).

HoldOffHunger · Answer 6 · 2021-02-02T01:31:37.237

Unique Identifiers? Yes! Unique identifiers!

One of the best ways to develop a MySQL DB is to have each id AUTOINCREMENT (Source MySQL.com). This allows a variety of advantages, too many to cover here. The problem with the question is that its example has duplicate ids. This disregards these tremendous advantages of unique identifiers, and at the same time, is confusing to those familiar with this already.

Cleanest Solution

DB Fiddle

Newer versions of MySQL come with ONLY_FULL_GROUP_BY enabled by default, and many of the solutions here will fail in testing with this condition.

Even so, we can simply select DISTINCT someuniquefield, MAX( whateverotherfieldtoselect ), ( *somethirdfield ), etc., and have no worries understanding the result or how the query works :

SELECT DISTINCT t1.id, MAX(t1.rev), MAX(t2.content)
FROM Table1 AS t1
JOIN Table1 AS t2 ON t2.id = t1.id AND t2.rev = (
    SELECT MAX(rev) FROM Table1 t3 WHERE t3.id = t1.id
)
GROUP BY t1.id;

SELECT DISTINCT Table1.id, max(Table1.rev), max(Table2.content) : Return DISTINCT somefield, MAX() some otherfield, the last MAX() is redundant, because I know it's just one row, but it's required by the query.
FROM Employee : Table searched on.
JOIN Table1 AS Table2 ON Table2.rev = Table1.rev : Join the second table on the first, because, we need to get the max(table1.rev)'s comment.
GROUP BY Table1.id: Force the top-sorted, Salary row of each employee to be the returned result.

Note that since "content" was "..." in OP's question, there's no way to test that this works. So, I changed that to "..a", "..b", so, we can actually now see that the results are correct:

id  max(Table1.rev) max(Table2.content)
1   3   ..d
2   1   ..b

Why is it clean? DISTINCT(), MAX(), etc., all make wonderful use of MySQL indices. This will be faster. Or, it will be much faster, if you have indexing, and you compare it to a query that looks at all rows.

Original Solution

With ONLY_FULL_GROUP_BY disabled, we can use still use GROUP BY, but then we are only using it on the Salary, and not the id:

SELECT *
FROM
    (SELECT *
    FROM Employee
    ORDER BY Salary DESC)
AS employeesub
GROUP BY employeesub.Salary;

SELECT * : Return all fields.
FROM Employee : Table searched on.
(SELECT *...) subquery : Return all people, sorted by Salary.
GROUP BY employeesub.Salary: Force the top-sorted, Salary row of each employee to be the returned result.

Unique-Row Solution

Note the Definition of a Relational Database: "Each row in a table has its own unique key." This would mean that, in the question's example, id would have to be unique, and in that case, we can just do :

SELECT *
FROM Employee
WHERE Employee.id = 12345
ORDER BY Employee.Salary DESC
LIMIT 1

Hopefully this is a solution that solves the problem and helps everyone better understand what's happening in the DB.

score 25 · Answer 7 · edited Aug 02 '20 at 16:11

25

Something like this?

SELECT yourtable.id, rev, content
FROM yourtable
INNER JOIN (
    SELECT id, max(rev) as maxrev
    FROM yourtable
    GROUP BY id
) AS child ON (yourtable.id = child.id) AND (yourtable.rev = maxrev)

edited Aug 02 '20 at 16:11

akki

2,021
1
24
35

answered Oct 12 '11 at 19:48

Marc B

356,200
43
426
500

The join-less ones wouldn't cut it? – Majid Fouladpour Oct 12 '11 at 19:51
1

If they work, then they're fine too. – Marc B Oct 12 '11 at 19:54
This seems to be the fastest one (with proper indexes). – Salman A Feb 13 '19 at 12:27
That lack of child on the other ON got me! – Gwen Au Sep 09 '20 at 07:27

schlebe · Answer 8 · 2019-02-24T09:55:37.037

24

Another manner to do the job is using MAX() analytic function in OVER PARTITION clause

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,MAX(rev) OVER (PARTITION BY id) as max_rev
      FROM YourTable
    ) t
  WHERE t.rev = t.max_rev

The other ROW_NUMBER() OVER PARTITION solution already documented in this post is

SELECT t.*
  FROM
    (
    SELECT id
          ,rev
          ,contents
          ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rank
      FROM YourTable
    ) t
  WHERE t.rank = 1

This 2 SELECT work well on Oracle 10g.

MAX() solution runs certainly FASTER that ROW_NUMBER() solution because MAX() complexity is O(n) while ROW_NUMBER() complexity is at minimum O(n.log(n)) where n represent the number of records in table !

edited Feb 24 '19 at 09:55

answered Feb 20 '18 at 09:07

schlebe

3,387
5
37
50

1

First query is perfect and majority of SO posts lack that discussion. Its performance efficient and useful when we are getting more columns. majority of the other soln are about getting max value of a column and not multiple rows with multiple columns when individual group have 10s of rows in each. Thnx. – sanpat Oct 29 '20 at 19:10
1

Best performance compared to all the other solutions. For my use case this is almost 9 times faster with thousands of partitions and tens of millions of records. – Ali Sarchami Dec 26 '20 at 17:14
Works, and works fast, in MySQL 8 and SQLite too. Also yes, agree MAX() is the best option. – wizzard0 Jun 08 '21 at 16:47

Bulat · Answer 9 · 2019-04-26T06:18:06.900

17

I like to use a NOT EXIST-based solution for this problem:

SELECT 
  id, 
  rev
  -- you can select other columns here
FROM YourTable t
WHERE NOT EXISTS (
   SELECT * FROM YourTable t WHERE t.id = id AND rev > t.rev
)

This will select all records with max value within the group and allows you to select other columns.

edited Apr 26 '19 at 06:18

answered Sep 05 '14 at 21:58

Bulat

6,869
1
29
52

1

yes, not exists like this has generally been the preferred way rather than a left join. In older versions of SQL server it was faster, although i think now it makes no difference. I normally do SELECT 1 instead of SELECT *, again because in prior versions it was faster. – EGP Oct 08 '14 at 12:38
In MySQL at least, the columns in the `SELECT` are ignored for `EXISTS` subqueries. So it doesn't matter what you write there – rjh Aug 13 '21 at 09:46
2

Seems to be the simplest most self-contained method. Comparable to `in` but without grouping and functions. Appears to be fast for my use case as well. Simplicity is important especially in cases where ORMs are used and this is just another condition that can be chained with others, it does not complicate the structure of the query like in cases where joins are used. – akostadinov Mar 24 '22 at 21:47

score 9 · Answer 10 · edited Feb 22 '19 at 18:30

9

SELECT *
FROM Employee
where Employee.Salary in (select max(salary) from Employee group by Employe_id)
ORDER BY Employee.Salary

edited Feb 22 '19 at 18:30

Cody Gray - on strike

239,200
50
490
574

answered Jul 30 '17 at 18:12

guru008

119
1
2

Jannes · Answer 11 · 2022-03-10T17:19:14.327

Note: I probably wouldn't recommend this anymore in MySQL 8+ days. Haven't used it in years.

A third solution I hardly ever see mentioned is MySQL specific and looks like this:

SELECT id, MAX(rev) AS rev
 , 0+SUBSTRING_INDEX(GROUP_CONCAT(numeric_content ORDER BY rev DESC), ',', 1) AS numeric_content
FROM t1
GROUP BY id

Yes it looks awful (converting to string and back etc.) but in my experience it's usually faster than the other solutions. Maybe that's just for my use cases, but I have used it on tables with millions of records and many unique ids. Maybe it's because MySQL is pretty bad at optimizing the other solutions (at least in the 5.0 days when I came up with this solution).

One important thing is that GROUP_CONCAT has a maximum length for the string it can build up. You probably want to raise this limit by setting the group_concat_max_len variable. And keep in mind that this will be a limit on scaling if you have a large number of rows.

Anyway, the above doesn't directly work if your content field is already text. In that case you probably want to use a different separator, like \0 maybe. You'll also run into the group_concat_max_len limit quicker.

score 6 · Answer 12 · answered Dec 29 '18 at 11:00

6

I think, You want this?

select * from docs where (id, rev) IN (select id, max(rev) as rev from docs group by id order by id)

SQL Fiddle : Check here

answered Dec 29 '18 at 11:00

Abhishek Rana

107
2
14

1

duplicate of https://stackoverflow.com/a/7745679/6064933, with no info gain. – jdhao Sep 30 '22 at 10:33

KyleMit · Answer 13 · 2018-08-17T14:55:51.023

5

NOT mySQL, but for other people finding this question and using SQL, another way to resolve the greatest-n-per-group problem is using Cross Apply in MS SQL

WITH DocIds AS (SELECT DISTINCT id FROM docs)

SELECT d2.id, d2.rev, d2.content
FROM DocIds d1
CROSS APPLY (
  SELECT Top 1 * FROM docs d
  WHERE d.id = d1.id
  ORDER BY rev DESC
) d2

Here's an example in SqlFiddle

edited Aug 17 '18 at 14:55

answered May 30 '14 at 13:47

KyleMit

30,350
66
462
664

very slow comparing to other methods - group by, windows, not exists – nahab Feb 15 '18 at 13:40

score 4 · Answer 14 · edited Apr 18 '19 at 20:42

4

Since this is most popular question with regard to this problem, I'll re-post another answer to it here as well:

It looks like there is simpler way to do this (but only in MySQL):

select *
from (select * from mytable order by id, rev desc ) x
group by id

Please credit answer of user Bohemian in this question for providing such a concise and elegant answer to this problem.

Edit: though this solution works for many people it may not be stable in the long run, since MySQL doesn't guarantee that GROUP BY statement will return meaningful values for columns not in GROUP BY list. So use this solution at your own risk!

edited Apr 18 '19 at 20:42

reformed

4,505
11
62
88

answered Jul 03 '14 at 14:33

Yuriy Nakonechnyy

3,742
4
29
41

9

Except that it's wrong, as there is no guarantee that the order of the inner query means anything, nor is the GROUP BY always guaranteed to take the first encountered row. At least in MySQL and I would assume all others. In fact I was under the assumption that MySQL would simply ignore the whole ORDER BY. Any future version or a change in configuration might break this query. – Jannes Oct 10 '14 at 10:14
@Jannes this is interesting remark :) I welcome you to answer my question providing proofs: http://stackoverflow.com/questions/26301877/mysql-select-from-subquery-order – Yuriy Nakonechnyy Oct 10 '14 at 14:41
2

@Jannes concerning GROUP BY not guaranteed to take the first encountered row - you are totally right - found this issue http://bugs.mysql.com/bug.php?id=71942 which asks to provide such guarantees. Will update my answer now – Yuriy Nakonechnyy Oct 10 '14 at 14:59
1

I think I remember where I got the ORDER BY being discarded from: MySQL does that with UNIONs if you ORDER BY the inner queries, it's just ignore: http://dev.mysql.com/doc/refman/5.0/en/union.html says "If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it will have no effect anyway." I haven't seen such a statement for the query in question here, but I don't see why it couldn't do that. – Jannes Oct 11 '14 at 19:09

score 4 · Answer 15 · answered Mar 04 '15 at 18:12

I would use this:

select t.*
from test as t
join
   (select max(rev) as rev
    from test
    group by id) as o
on o.rev = t.rev

Subquery SELECT is not too eficient maybe, but in JOIN clause seems to be usable. I'm not an expert in optimizing queries, but I've tried at MySQL, PostgreSQL, FireBird and it does work very good.

You can use this schema in multiple joins and with WHERE clause. It is my working example (solving identical to yours problem with table "firmy"):

select *
from platnosci as p
join firmy as f
on p.id_rel_firmy = f.id_rel
join (select max(id_obj) as id_obj
      from firmy
      group by id_rel) as o
on o.id_obj = f.id_obj and p.od > '2014-03-01'

It is asked on tables having teens thusands of records, and it takes less then 0,01 second on really not too strong machine.

I wouldn't use IN clause (as it is mentioned somewhere above). IN is given to use with short lists of constans, and not as to be the query filter built on subquery. It is because subquery in IN is performed for every scanned record which can made query taking very loooong time.

I think using that subquery as a CTE might at least improve performance — Don Cheadle, Jan 10 '17 at 18:52
Hi! For me it looks like your 1st query needs ... `and o.id = t.id` in the end (and the subquery should return `id` for that). Doesn't it? — Dmitry Grekov, Aug 10 '18 at 11:37

score 3 · Answer 16 · edited Dec 29 '18 at 15:51

3

How about this:

SELECT all_fields.*  
FROM (SELECT id, MAX(rev) FROM yourtable GROUP BY id) AS max_recs  
LEFT OUTER JOIN yourtable AS all_fields 
ON max_recs.id = all_fields.id

edited Dec 29 '18 at 15:51

Munim Munna

17,178
6
29
58

answered Jul 14 '13 at 16:09

inor

2,781
2
32
42

score 3 · Answer 17 · answered Sep 04 '15 at 05:33

3

If you have many fields in select statement and you want latest value for all of those fields through optimized code:

select * from
(select * from table_name
order by id,rev desc) temp
group by id

answered Sep 04 '15 at 05:33

seahawk

1,872
12
18

This works OK for small tables, but takes 6 passes over the entire dataset, so not fast for large tables. – Rick James May 17 '17 at 00:48
This is the query I needed because there were other columns involved, too. – Mike Viens Jun 01 '18 at 19:07

plavozont · Answer 18 · 2014-03-17T08:28:23.923

2

This solution makes only one selection from YourTable, therefore it's faster. It works only for MySQL and SQLite(for SQLite remove DESC) according to test on sqlfiddle.com. Maybe it can be tweaked to work on other languages which I am not familiar with.

SELECT *
FROM ( SELECT *
       FROM ( SELECT 1 as id, 1 as rev, 'content1' as content
              UNION
              SELECT 2, 1, 'content2'
              UNION
              SELECT 1, 2, 'content3'
              UNION
              SELECT 1, 3, 'content4'
            ) as YourTable
       ORDER BY id, rev DESC
   ) as YourTable
GROUP BY id

edited Mar 17 '14 at 08:28

answered Jan 29 '14 at 07:49

plavozont

807
9
17

This doesn't appear to work for the general case. And, it doesn't work at all in PostgreSQL, returning: `ERROR: column "your table.reb" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT *` – ma11hew28 Mar 13 '14 at 16:26
Sorry I didn't clarify the first time at which language it worked. – plavozont Mar 17 '14 at 05:11

score 2 · Answer 19 · edited Jan 07 '15 at 12:11

2

Here is a nice way of doing that

Use following code :

with temp as  ( 
select count(field1) as summ , field1
from table_name
group by field1 )
select * from temp where summ = (select max(summ) from temp)

edited Jan 07 '15 at 12:11

Ashish Kakkad

23,586
12
103
136

answered Jan 07 '15 at 11:36

shay

21
1

score 2 · Answer 20 · answered Jul 16 '15 at 18:52

2

I like to do this by ranking the records by some column. In this case, rank rev values grouped by id. Those with higher rev will have lower rankings. So highest rev will have ranking of 1.

select id, rev, content
from
 (select
    @rowNum := if(@prevValue = id, @rowNum+1, 1) as row_num,
    id, rev, content,
    @prevValue := id
  from
   (select id, rev, content from YOURTABLE order by id asc, rev desc) TEMP,
   (select @rowNum := 1 from DUAL) X,
   (select @prevValue := -1 from DUAL) Y) TEMP
where row_num = 1;

Not sure if introducing variables makes the whole thing slower. But at least I'm not querying YOURTABLE twice.

answered Jul 16 '15 at 18:52

user5124980

21
1

Only tried approach in MySQL. Oracle has a similar function for ranking records. Idea should work too. – user5124980 Jul 16 '15 at 18:54
1

Reading & writing a variable in a select statement is undefined in MySQL although particular versions happen to give the answer you might expect for certain syntax involving case expressions. – philipxy Sep 22 '18 at 10:57

score 2 · Answer 21 · answered Jun 20 '17 at 10:10

2

here is another solution hope it will help someone

Select a.id , a.rev, a.content from Table1 a
inner join 
(SELECT id, max(rev) rev FROM Table1 GROUP BY id) x on x.id =a.id and x.rev =a.rev

answered Jun 20 '17 at 10:10

Abdul Samad

39
1
6

score 2 · Answer 22 · answered Jul 13 '17 at 18:19

2

None of these answers have worked for me.

This is what worked for me.

with score as (select max(score_up) from history)
select history.* from score, history where history.score_up = score.max

answered Jul 13 '17 at 18:19

qaisjp

722
8
31

score 2 · Answer 23 · edited Oct 17 '17 at 00:18

Here's another solution to retrieving the records only with a field that has the maximum value for that field. This works for SQL400 which is the platform I work on. In this example, the records with the maximum value in field FIELD5 will be retrieved by the following SQL statement.

SELECT A.KEYFIELD1, A.KEYFIELD2, A.FIELD3, A.FIELD4, A.FIELD5
  FROM MYFILE A
 WHERE RRN(A) IN
   (SELECT RRN(B) 
      FROM MYFILE B
     WHERE B.KEYFIELD1 = A.KEYFIELD1 AND B.KEYFIELD2 = A.KEYFIELD2
     ORDER BY B.FIELD5 DESC
     FETCH FIRST ROW ONLY)

score 1 · Answer 24 · answered Dec 11 '15 at 03:14

1

Sorted the rev field in reverse order and then grouped by id which gave the first row of each grouping which is the one with the highest rev value.

SELECT * FROM (SELECT * FROM table1 ORDER BY id, rev DESC) X GROUP BY X.id;

Tested in http://sqlfiddle.com/ with the following data

CREATE TABLE table1
    (`id` int, `rev` int, `content` varchar(11));

INSERT INTO table1
    (`id`, `rev`, `content`)
VALUES
    (1, 1, 'One-One'),
    (1, 2, 'One-Two'),
    (2, 1, 'Two-One'),
    (2, 2, 'Two-Two'),
    (3, 2, 'Three-Two'),
    (3, 1, 'Three-One'),
    (3, 3, 'Three-Three')
;

This gave the following result in MySql 5.5 and 5.6

id  rev content
1   2   One-Two
2   2   Two-Two
3   3   Three-Two

answered Dec 11 '15 at 03:14

blokeish

571
5
9

This technique used to work, but no longer. See https://mariadb.com/kb/en/mariadb/group-by-trick-has-been-optimized-away/ – Rick James Apr 01 '17 at 22:02
1

The original question tag is "mysql" and I have stated very clearly that my solution was tested with both Mysql 5.5 and 5.6 in sqlfiddle.com. I have provided all steps to independently verify the solution. I have not made any false claims that my solution works with Mariadb. Mariadb is not Mysql, its just a drop-in replacement for Mysql, owned by 2 different companies. Your comment will help anyone that is trying to implement it in Mariadb but my post in no way deserve a negative vote as it clearly answers the question that was asked. – blokeish Apr 03 '17 at 00:34
1

Yes, it works in older versions. And I have used that technique in the past, only to be burned when it stopped working. Also MySQL (in 5.7?) will also be ignoring the `ORDER BY` in a subquery. Since lots of people will read your answer, I am trying to steer them away from a technique that will break in their future. (And I did not give you the -1 vote.) – Rick James Apr 03 '17 at 02:38
1

Tests prove nothing. ORDER BY in a subquery has no guaranteed effect other than for a LIMIT in the same subquery. Even if order was preserved, the GROUP BY would not preserve it. Even if it were preserved, non-standard GROUP BY relying on disabled ONLY_FULL_GROUP_BY is specified to return *some* row in a group for a non-grouped column but not necessarily the first. So your query is not correct. – philipxy Sep 22 '18 at 11:50

score 1 · Answer 25 · answered Sep 17 '18 at 09:08

You can make the select without a join when you combine the rev and id into one maxRevId value for MAX() and then split it back to original values:

SELECT maxRevId & ((1 << 32) - 1) as id, maxRevId >> 32 AS rev
FROM (SELECT MAX(((rev << 32) | id)) AS maxRevId
      FROM YourTable
      GROUP BY id) x;

This is especially fast when there is a complex join instead of a single table. With the traditional approaches the complex join would be done twice.

The above combination is simple with bit functions when rev and id are INT UNSIGNED (32 bit) and combined value fits to BIGINT UNSIGNED (64 bit). When the id & rev are larger than 32-bit values or made of multiple columns, you need combine the value into e.g. a binary value with suitable padding for MAX().

score 1 · Answer 26 · answered Feb 22 '19 at 15:18

Explanation

This is not pure SQL. This will use the SQLAlchemy ORM.

I came here looking for SQLAlchemy help, so I will duplicate Adrian Carneiro's answer with the python/SQLAlchemy version, specifically the outer join part.

This query answers the question of:

"Can you return me the records in this group of records (based on same id) that have the highest version number".

This allows me to duplicate the record, update it, increment its version number, and have the copy of the old version in such a way that I can show change over time.

Code

MyTableAlias = aliased(MyTable)
newest_records = appdb.session.query(MyTable).select_from(join(
    MyTable, 
    MyTableAlias, 
    onclause=and_(
        MyTable.id == MyTableAlias.id,
        MyTable.version_int < MyTableAlias.version_int
    ),
    isouter=True
    )
).filter(
    MyTableAlias.id  == None,
).all()

Tested on a PostgreSQL database.

score 0 · Answer 27 · answered Jan 05 '18 at 10:51

I used the below to solve a problem of my own. I first created a temp table and inserted the max rev value per unique id.

CREATE TABLE #temp1
(
    id varchar(20)
    , rev int
)
INSERT INTO #temp1
SELECT a.id, MAX(a.rev) as rev
FROM 
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as a 
GROUP BY a.id
ORDER BY a.id

I then joined these max values (#temp1) to all of the possible id/content combinations. By doing this, I naturally filter out the non-maximum id/content combinations, and am left with the only max rev values for each.

SELECT a.id, a.rev, content
FROM #temp1 as a
LEFT JOIN
    (
        SELECT id, content, SUM(rev) as rev
        FROM YourTable
        GROUP BY id, content
    ) as b on a.id = b.id and a.rev = b.rev
GROUP BY a.id, a.rev, b.content
ORDER BY a.id

SQL select only rows with max value on a column

27 Answers27

At first glance...

It's never that simple, is it?

Joining with simple `group-identifier, max-value-in-group` Sub-query

Left Joining with self, tweaking join conditions and filters

Conclusion

Unique Identifiers? Yes! Unique identifiers!

Cleanest Solution

Original Solution

Unique-Row Solution

Here's an example in SqlFiddle

Explanation

Code

Linked

Related

SQL select only rows with max value on a column

27 Answers27

At first glance...

It's never that simple, is it?

Joining with simple group-identifier, max-value-in-group Sub-query

Left Joining with self, tweaking join conditions and filters

Conclusion

Unique Identifiers? Yes! Unique identifiers!

Cleanest Solution

Original Solution

Unique-Row Solution

Here's an example in SqlFiddle

Explanation

Code

Linked

Related

Joining with simple `group-identifier, max-value-in-group` Sub-query