1

I would like some help regarding a MySQL request to select only rows with Max Value on a Column.

I have read and implemented the solution given there : SQL Select only rows with Max Value on a Column

SELECT * FROM yourtable AS yt1
LEFT OUTER JOIN yourtable yt2 ON (yt1.id = yt2.id AND yt1.rev < yt2.rev)
WHERE yt2.id IS NULL;

and it works fine, but only as long as the field you want to filter on is actually the max or min.

What if you wanted to filter on a custom order?

id   rev   content..
.1.  1 
.2   1......---...
.1   2......---...
.1   3......---...

That is, instead of selecting two rows: [1, 3, ...] and [2, 1, ..] in the table above, you want your result set to be [1, 2, ...] and [2, 1, ..] because it has been determined that rev=2 has a higher priority than both rev=3 and rev=1.

Would you need to redefine the order relation on "<", or something?

I would rather avoid having to use a sub-select.

Thanks.

Community
  • 1
  • 1
boz74
  • 11
  • 2
  • 1
    Instead of using a join, use `HAVING`. as in : `SELECT * FROM yourtable AS yt1 HAVING yt1.rev = MAX(yt1.rev)` – Anthony Dec 10 '13 at 10:41
  • Possibly have another table that defines the priority of a value of rev, join against that table twice (once for yt1, once for yt2) and use the priorities for those rows rev values rather than just the values of rev. – Kickstart Dec 10 '13 at 10:43
  • @Kickstart: thanks, that would be a solution. But I'm really looking to use the existing database "as is". – boz74 Dec 10 '13 at 11:10
  • @Anthony: max would still give me [1, 3] instead of [1, 2]. If I'm looking for the max then the join works fine. – boz74 Dec 10 '13 at 11:12
  • @boz74 - how do you know the priority? Is this defined somewhere, or calculated? – Kickstart Dec 10 '13 at 11:19
  • @Kickstart: I know the priority because I overheard it. No, seriously it is not really defined anywhere in the database or calculated. Of course it should probably be, but it isn't. I'm really looking for a better solution than having to do a sub-select, which does work but gives an awfully long query that will be hard to maintain. – boz74 Dec 10 '13 at 11:26
  • Unless somehow the priority is defined or calculated I can't see a way of doing it. If a sub query works, how does that decide on the priority? – Kickstart Dec 10 '13 at 11:35
  • @Kickstart: it does not decide, I tell it the priorities using "CASE rev when 2 then 1 when 3 then 2 when 1 then 3 else 99 end". The sub-query simply returns the rev value i'm looking for. – boz74 Dec 10 '13 at 11:54
  • That could be done with a fixed sub query which should be reasonably quick – Kickstart Dec 10 '13 at 12:18

2 Answers2

0

Sample data:

CREATE TABLE yourtable
    (`id` int, `rev` int, content varchar(20))
;

INSERT INTO yourtable
    (`id`, `rev`, content)
VALUES
    (1, 1, 'a'),
    (2, 1, 'b'),
    (1, 2, 'c'),
    (1, 3, 'd')
;

Solution 1:

SELECT * FROM yourtable AS yt1
LEFT OUTER JOIN yourtable yt2 ON (yt1.id = yt2.id AND FIELD(yt1.rev, 1,3,2) < FIELD(yt2.rev, 1,3,2))
WHERE yt2.id IS NULL;

but I don't like having functions on columns on joins, especially when having a range comparison instead equality comparison.

Solution 2:

SELECT id, rev, content FROM (
  SELECT yt.*, @grpnum := IF(id != @prevgrp, 1, @grpnum + 1) as my_filter, @prevgrp := id 
  FROM yourtable yt, 
  (SELECT @grpnum:=0, @prevgrp := 0) v 
  ORDER BY id, FIELD(rev, 1, 3, 2) DESC
) sq
WHERE my_filter = 1;

Result:

| ID | REV | CONTENT |
|----|-----|---------|
|  1 |   2 |       c |
|  2 |   1 |       b |

Explanation Solution 2:

First I select everything from yourtable and order it by id and then by FIELD(rev, 1, 3, 2). What does this FIELD() function do?

Consider this data:

mysql> select id from value order by id;
+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
|  6 |
+----+

Now, when you do

mysql> select id, field(id, 3,2,5) from value order by id;
+----+------------------+
| id | field(id, 3,2,5) |
+----+------------------+
|  1 |                0 |
|  2 |                2 |
|  3 |                1 |
|  4 |                0 |
|  5 |                3 |
|  6 |                0 |
+----+------------------+

You see, that the function returns the index of the column id in the parameter list of the function. Everything that is not found is returned as 0.

Read more about it here in the manual.

Since we want to have the maximum, we flip the order in our parameter list of FIELD(). Instead of FIELD(rev, 2, 3, 1) we have FIELD(rev, 1, 3, 2) in assumption, that the priority order is 2 as highest, then 3, then 1. (EDIT: I just saw now, that you have another order and you can of course also use your CASE WHEN construct, it does the same, the FIELD() function is just less to write)

Next step is to initialize some variables on the fly in the subquery aliased with v. It's the same as writing SET @grpnum = 0; before the query.

When the id changes (note, that it's important we order by id, then by field()), we increment another variable. This works as filter in the outer query.

This way we return all of the row corresponding to your defined maximum value in the FIELD() function.

fancyPants
  • 50,732
  • 33
  • 89
  • 96
0

Maybe something like this, assigning the priority in the sub queries (that you can build up)

SELECT * 
FROM yourtable AS yt1
LEFT OUTER JOIN (SELECT 1 AS rev, 9 AS rev_priority UNION SELECT 2, 9 UNION SELECT 3, 9) yt1_priority ON yt1.rev = yt1_priority.rev
LEFT OUTER JOIN yourtable yt2 
ON yt1.id = yt2.id 
LEFT OUTER JOIN (SELECT 1 AS rev, 9 AS rev_priority UNION SELECT 2, 9 UNION SELECT 3, 9) yt1_priority ON yt1.rev = yt2_priority.rev
ON yt1_priority.rev_priority < yt2_priority.rev_priority
WHERE yt2.id IS NULL;

(not tested)

Kickstart
  • 21,403
  • 2
  • 21
  • 33