45

I have the following table:

    id       time      text      otheridentifier
    -------------------------------------------
    1        6         apple     4
    2        7         orange    4
    3        8         banana    3
    4        9         pear      3
    5        10        grape     2

What I want to do is select the 3 most recent records (by time desc), whose otheridentifiers are distinct. So in this case, the result would be id's: 5, 4, and 2.

id = 3 would be skipped because there's a more recent record with the same otheridentifier field.

Here's what I tried to do:

SELECT * FROM `table` GROUP BY (`otheridentifier`) ORDER BY `time` DESC LIMIT 3

However, I end up getting rows of id = 5, 3, and 1 instead of 5, 4, 2 as expected.

Can someone tell me why this query wouldn't return what I expected? I tried changing the ORDER BY to ASC but this simply rearranges the returned rows to 1, 3, 5.

Machavity
  • 30,841
  • 27
  • 92
  • 100
atp
  • 30,132
  • 47
  • 125
  • 187

8 Answers8

34

It doesn't return what you expect because grouping happens before ordering, as reflected by the position of the clauses in the SQL statement. You're unfortunately going to have to get fancier to get the rows you want. Try this:

SELECT *
FROM `table`
WHERE `id` = (
    SELECT `id`
    FROM `table` as `alt`
    WHERE `alt`.`otheridentifier` = `table`.`otheridentifier`
    ORDER BY `time` DESC
    LIMIT 1
)
ORDER BY `time` DESC
LIMIT 3
chaos
  • 122,029
  • 33
  • 303
  • 309
  • 1
    I remembered the time I spent hours to fix sql like this and turns out mysql 4.0 doesn't support nested queries ;p – Unreality May 29 '09 at 05:29
  • 1
    @Unreality: Fortunately most solutions involving subqueries can be expressed as joins if necessary. :) – Rytmis May 29 '09 at 05:32
  • 1
    As much as possible, avoid subqueries because they are slow. Use LEFT JOIN instead :) – marknt15 May 29 '09 at 05:48
  • 4
    @Jasie: http://dev.mysql.com/doc/ :) @marknt115: Yes, avoid them as much as possible, but only that much. – chaos May 29 '09 at 07:15
  • @marknt15, No subqueries are not slow just because they are subqueries. See http://www.percona.com/blog/2010/03/18/when-the-subselect-runs-faster/ – Pacerier Apr 09 '15 at 13:46
18

You could join the table on itself to filter the last entry per otheridentifier, and then take the top 3 rows of that:

SELECT last.*
FROM `table` last
LEFT JOIN `table` prev 
    ON prev.`otheridentifier` = last.`otheridentifier`
    AND prev.`time` < last.`time`
WHERE prev.`id` is null
ORDER BY last.`time` DESC 
LIMIT 3
Andomar
  • 232,371
  • 49
  • 380
  • 404
  • Andomar, can you explain how MySQL is using table `prev` in your query? I attempted to use this on a local query and am getting an error saying 'database.prev` doesn't exist. – a coder Jan 28 '13 at 23:15
  • Fiddle with OP's data and your query. Not sure why this answer was upvoted 6 times - it is not working here: http://sqlfiddle.com/#!2/ace0b/1 – a coder Jan 28 '13 at 23:23
  • 1
    @acoder: You're right, there was a small error in the query: the table name after `left join` was missing. I've updated the answer. – Andomar Jan 29 '13 at 08:40
  • Works fine now - http://sqlfiddle.com/#!2/ace0b/6. Thanks for the reply & update. – a coder Jan 29 '13 at 12:35
  • 1
    It seems you duplicated OP's unwanted results. Switching less than for greater than seems to work, but this along with most of the other answers seems to massively overcomplicate things. ORDER BY MAX(time) is AFAICT the correct solution. – sequentiallee Mar 31 '13 at 15:45
  • Disagree with @Lee. This solution is the most efficient one. – kayue Nov 08 '13 at 06:46
  • Thank you @Lee, this was the simplest solution by far (adding MAX())! – kfriend May 22 '15 at 16:45
4

I had a similar requirement, but I had more advanced selection criteria. Using some of the other answers I couldn't get exactly what I needed, but I found you can still do a GROUP BY after and ORDER BY like this:

SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t 
GROUP BY t.otheridentifier
11101101b
  • 7,679
  • 2
  • 42
  • 52
2

You can use this query to get correct answer:

SELECT * FROM 
      (SELECT * FROM `table` order by time DESC)
          t group by otheridentifier
Brad Larson
  • 170,088
  • 45
  • 397
  • 571
php
  • 4,307
  • 1
  • 24
  • 13
2
SELECT * FROM table t1 
WHERE t1.time = 
    (SELECT MAX(time) FROM table t2 
     WHERE t2.otheridentifier = t1.otheridentifier)
Rytmis
  • 31,467
  • 8
  • 60
  • 69
  • How would this select the latest row per otheridentifier? – Andomar May 29 '09 at 05:32
  • @Andomar: I shouldn't try answering questions when I'm not fully awake. Changed the column names a bit -- see if it makes more sense now. :) – Rytmis May 29 '09 at 05:38
2

Andomar's answer is probably best in that it doesn't use a subquery.

An alternative approach:

select *
from   `table` t1
where  t1.`time` in (
                    select   max(s2.`time`)
                    from     `table` t2
                    group by t2.otheridentifier
                    )
Community
  • 1
  • 1
mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • I think I see a problem here if the time values aren't unique -- this might return rows that it shouldn't. Suppose there's a time value that is the maximum for one otheridentifier, but is, say, second largest for another otheridentifier. Wouldn't this query then return both otheridentifiers? I may be altogether off though, I'm still a bit tired. :) – Rytmis May 29 '09 at 05:49
  • @Rytmis: Yeah, and so would my query, and yours:) hehe – Andomar May 29 '09 at 05:51
  • @Andomar: Hmm, are you sure about my query though? Because I just tested it by adding a row (6, 7, 'strawberry', 3) -- the time value 7 is the largest in the group that has otheridentifier 4, but the second largest in the one that has otheridentifier 3. My query still only returns the rows the OP wanted. Is my test case wrong? :) – Rytmis May 29 '09 at 06:08
  • @Andomar: Nope, the test case is correct -- this query returns the 'strawberry' row while mine doesn't. – Rytmis May 29 '09 at 06:09
  • @Rytmis: try it with (6,7,Strawberry,4) – Andomar May 29 '09 at 16:49
1

what about

SELECT *, max(time) FROM `table`  group by otheridentifier
mbo
  • 164
  • 8
  • Edit - this does work on the OP's data. I had some extra joins in my query. This works fine with the OP's data. http://sqlfiddle.com/#!2/ace0b/2 – a coder Jan 28 '13 at 23:26
  • 1
    It doesn't seem to work afaict. In your sqlfiddle, it should show orange and pear - their time value is higher. – mahemoff May 18 '13 at 23:00
0

This also:

SELECT * FROM
OrigTable T INNER JOIN 
( 
SELECT otheridentifier,max(time) AS duration
FROM T
GROUP BY otheridentifier) S
ON S.duration = T.time AND S.otheridentifier = T.otheridentifier.
user2450223
  • 235
  • 2
  • 10