How to select single row based on the max value in multiple rows

Question

Possible Duplicate:
SQL: Find the max record per group

I have a table with four columns as such:

name   major    minor  revision
p1     0        4      3
p1     1        0      0
p1     1        1      4
p2     1        1      1
p2     2        5      0
p3     3        4      4

This is basically ca table containing records for each version of a program. I want to do a select to get all of the programs and their latest version so the results would look like this:

name   major    minor  revision
p1     1        1      4
p2     2        5      0
p3     3        4      4

I can't just group by the name and get the max of each column because then i would just end up with the highest number from each column, but not the specific row with the highest version. How can I set this up?

on which column basis you want to filer the data?? `major` `minor` or `revision` — xkeshav, Jan 04 '12 at 07:38
This is the same "greatest-n-per-group" question that has been asked multiple times on SE (http://stackoverflow.com/questions/2657482/sql-find-the-max-record-per-group is just one other example of many) — m-smith, Jan 04 '12 at 09:10
@LordScree: Don't think so, this is just a normal maximum, but over multiple columns. Greatest-n-per-group is when you group on col1, and want to find the value of col2 for which col3 is highest. — Andomar, Jan 04 '12 at 15:37
I don't know why this was closed as exact duplicate. It is not the same as the suggested duplicate at all. That question wants the rows based on the max of one column. I am asking for the records where the max is determined from three columns. Doesn't matter I suppose. I got some good answers here. thanks. — Brian, Jan 09 '12 at 01:42

score 11 · Answer 1 · edited Jan 04 '12 at 08:41

11

You can use a not exists subquery to filter out older records:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                (
                    yt.major < older.major or
                    yt.major = older.major and yt.minor < older.minor or
                    yt.major = older.major and yt.minor = older.minor and
                        yt.revision < older.revision
                )
        )

which can also be written in MySQL as:

select  *
from    YourTable yt
where   not exists
        (
        select  *
        from    YourTable older
        where   yt.name = older.name and 
                  (yt.major,    yt.minor,    yt.revision) 
                < (older.major, older.major, older.revision)
        )

edited Jan 04 '12 at 08:41

ypercubeᵀᴹ

113,259
19
174
235

answered Jan 04 '12 at 07:38

Andomar

232,371
49
380
404

1

`and` usually has higher priority than `or`. If that is the case with MySQL, everything after the first `and` in the nested `select`'s `where` should probably be enclosed in brackets. – Andriy M Jan 04 '12 at 08:26
+1 nice query = easy to understand – Florin Ghita Jan 04 '12 at 08:27
@AndriyM: You're right, edited the answer – Andomar Jan 04 '12 at 08:29
@Andomar: I hope you don't mind the addition. – ypercubeᵀᴹ Jan 04 '12 at 08:41

score 9 · Accepted Answer · edited May 23 '17 at 12:21

The way I try to solve SQL problems is to take things step by step.

You want the maximum revision for the maximum minor version corresponding to the maximum major version for each product.

The maximum major number for each product is given by:

SELECT Name, MAX(major) AS Major FROM CA GROUP BY Name;

The maximum minor number corresponding to the maximum major number for each product is therefore given by:

SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
  FROM CA
  JOIN (SELECT Name, MAX(Major) AS Major
          FROM CA
         GROUP BY Name
       ) AS CB
    ON CA.Name = CB.Name AND CA.Major = CB.Major
 GROUP BY CA.Name, CA.Major;

And the maximum revision (for the maximum minor version number corresponding to the maximum major number for each product), therefore, is given by:

SELECT CA.Name, CA.Major, CA.Minor, MAX(CA.Revision) AS Revision
  FROM CA
  JOIN (SELECT CA.Name, CA.Major, MAX(CA.Minor) AS Minor
          FROM CA
          JOIN (SELECT Name, MAX(Major) AS Major
                  FROM CA
                 GROUP BY Name
               ) AS CB
            ON CA.Name = CB.Name AND CA.Major = CB.Major
         GROUP BY CA.Name, CA.Major
       ) AS CC
    ON CA.Name = CC.Name AND CA.Major = CC.Major AND CA.Minor = CC.Minor
 GROUP BY CA.Name, CA.Major, CA.Minor;

Tested - it works and produces the same answer as Andomar's query does.

Performance

I created a bigger volume of data (11616 rows of data), and ran a benchmark timing of Andomar's query against mine - target DBMS was IBM Informix Dynamic Server (IDS) version 11.70.FC2 running on MacOS X 10.7.2. I used the first of Andomar's two queries since IDS does not support the comparison notation in the second one. I loaded the data, updated statistics, and ran the queries both with mine followed by Andomar's and with Andomar's followed by mine. I also recorded the basic costs reported by the IDS optimizer. The result data from both queries were the same (so the queries are both accurate - or equally inaccurate).

Table unindexed:

Andomar's query                           Jonathan's query
Time: 22.074129                           Time: 0.085803
Estimated Cost: 2468070                   Estimated Cost: 22673
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 132
Temporary Files Required For: Order By    Temporary Files Required For: Group By

Table with unique index on (name, major, minor, revision):

Andomar's query                           Jonathan's query
Time: 0.768309                            Time: 0.060380
Estimated Cost: 31754                     Estimated Cost: 2329
Estimated # of Rows Returned: 5808        Estimated # of Rows Returned: 139
                                          Temporary Files Required For: Group By

As you can seen, the index dramatically improves the performance of Andomar's query, but it still seems to be more expensive on this system than my query. The index gives a 25% time saving for my query. I'd be curious to see comparable figures for the two versions of Andomar's query on comparable volumes of data, with and without the index. (My test data can be supplied if you need it; there were 132 products - the 3 listed in the question and 129 new ones; each new product had (the same) 90 version entries.)

The reason for the discrepancy is that the sub-query in Andomar's query is a correlated sub-query, which is a relatively expensive process (dramatically so when the index is missing).

@ypercube: not readily, no. IDS doesn't have the support for 'implicit' rows of the sort you're joining - neither equality as in your query nor less than as in Andemar's second query. There are vaguely equivalent (but non-standard) notations; I'd have to work out how to get them into play (and I suspect it would be more verbose than the standard notation that isn't supported by IDS). OTOH, I believe my query should translate to MySQL without problem. — Jonathan Leffler, Jan 05 '12 at 02:59
Yes, I tested yours and works just fine. I guess mine would work anywhere (probably in Informix, too) if the table had an artificial Primary Key and the join was rewritten as: `ON cam.Pk = ( SELECT FIRST 1 Pk FROM ... )` — ypercubeᵀᴹ, Jan 05 '12 at 12:36

Florin Ghita · Answer 3 · 2012-01-06T12:20:24.547

2

Update3 variable group_concat_max_len has a minvalue = 4 so we can't use it. But you can:

select 
  name, 
  SUBSTRING_INDEX(group_concat(major order by major desc),',', 1) as major, 
  SUBSTRING_INDEX(group_concat(minor order by major desc, minor desc),',', 1)as minor, 
  SUBSTRING_INDEX(group_concat(revision order by major desc, minor desc, revision desc),',', 1) as revision
from your_table
group by name;

this was tested here and no, the previous version does not provide wrong results, it had only the problem with number of concatenated values.

edited Jan 06 '12 at 12:20

answered Jan 04 '12 at 07:36

Florin Ghita

17,525
6
57
76

1

This would return non-existing versions, like `p1 1 4 4` – Andomar Jan 04 '12 at 07:40
@Andomar you are right, I try to revise my query – Florin Ghita Jan 04 '12 at 08:03
@Andomar do you like this new version? – Florin Ghita Jan 04 '12 at 12:04
New version still has the same problem-- [you can test it here](http://sqlize.com/), if you like – Andomar Jan 04 '12 at 15:28
I have tested my code and, with a small update(the substring trick), it is ok – Florin Ghita Jan 06 '12 at 13:06

ypercubeᵀᴹ · Answer 4 · 2012-01-04T08:53:24.007

2

SELECT cam.*
FROM 
      ( SELECT DISTINCT name
        FROM ca 
      ) AS cadistinct
  JOIN 
      ca AS cam
    ON ( cam.name, cam.major, cam.minor, cam.revision )
     = ( SELECT name, major, minor, revision
         FROM ca
         WHERE name = cadistinct.name
         ORDER BY major DESC
                , minor DESC
                , revision DESC
         LIMIT 1
       )

This will work in MySQL (current versions) but I woudn't recommend it:

SELECT *
FROM 
    ( SELECT name, major, minor, revision
      FROM ca
      ORDER BY name
             , major DESC
             , minor DESC
             , revision DESC
    ) AS tmp
GROUP BY name

edited Jan 04 '12 at 08:53

answered Jan 04 '12 at 08:28

ypercubeᵀᴹ

113,259
19
174
235

+1 nice. It is the same ideea like in my sencond query :) – Florin Ghita Jan 04 '12 at 08:34
+1 not sure if it'll work tho... does MySQL allow `limit 1` in a subquery? – Andomar Jan 04 '12 at 08:35
I dont understant your second query. It would get first rows for major, minor, revision from the subquery??? MySQL is strange – Florin Ghita Jan 04 '12 at 08:41
@Florin: The 2nd query only works because the MySQL engine will first order the rows in the subquery and then use that order in the external `GROUP BY`, taking the first row it finds. It's not ANSI SQL. – ypercubeᵀᴹ Jan 04 '12 at 08:55
@Andomar: Yes. `LIMIT` is allowed but not in `IN/ALL/ANY/SOME` subqueries – ypercubeᵀᴹ Jan 04 '12 at 09:10

SWeko · Answer 5 · 2012-01-04T07:58:21.347

If there are numbers in those columns, you could come up with some kind of a formula that will be unique and well ordered for the major, minor, revision values. E.g. if the numbers are less than 10, you could just append them as strings, and compare them, like:

select name, major, minor, revision, 
       concat(major, minor, revision) as version
from versions

If they are numbers that will not be larger than 100, you could do something like:

select name, major, minor, revision, 
       (major * 10000 + minor * 100 + revision) as version
from versions

You could than just get the max of version grouped by name, like this:

select name, major, minor, revision 
from (
    select name, major, minor, revision, 
           (major * 10000 + minor * 100 + revision) as version
    from versions) v1
where version = (select max (major * 10000 + minor * 100 + revision) 
                 from versions v2 
                 where v1.name = v2.name)

Sorry, this is just a partial query, then grouping/filtering is not shown, will edit — SWeko, Jan 04 '12 at 07:47

score 1 · Answer 6 · answered Jan 04 '12 at 07:59

It allows max three digits per part of version number. If you want to use more digits then add two zeros to major multiplication an one zero to minor multiplication for each digit (I hope it's clear).

select  t.* 
from yourTable t
join (
    select name, max(major * 1000000 + minor * 1000  + revision) as ver
    from yourTable 
    group by name
) t1 on t1.ver = (t.major * 1000000 + t.minor * 1000  + t.revision)

Result:

name    major   minor   revision
p1      1       1       4
p2      2       5       0
p3      3       4       4

score 1 · Answer 7 · answered Jan 05 '12 at 17:41

1

Am I the only one thinking that the greatest version is the one with the highest revision?

So,

select a.name, a.major, a.minor, a.revision
from table a
where a.revision = (select max(b.revision) from table b where b.name = a.name)

answered Jan 05 '12 at 17:41

aF.

64,980
43
135
198

You're probably not the only one, but then again not everyone uses the same conventions in regard to versioning. In my situation (and I'm guessing the same for OP) I need the MAX of all four fields, not just one. – dctucker Feb 18 '14 at 16:58
Plus wouldn't this require a select query for every single record... which is terribly inefficient? – Chadwick Meyer Nov 20 '14 at 22:25

How to select single row based on the max value in multiple rows

7 Answers7

Performance

Linked