213

I'm trying to list the latest destination (MAX departure time) for each train in a table, for example:

Train    Dest      Time
1        HK        10:00
1        SH        12:00
1        SZ        14:00
2        HK        13:00
2        SH        09:00
2        SZ        07:00

The desired result should be:

Train    Dest      Time
1        SZ        14:00
2        HK        13:00

I have tried using

SELECT Train, Dest, MAX(Time)
FROM TrainTable
GROUP BY Train

by I got a "ora-00979 not a GROUP BY expression" error saying that I must include 'Dest' in my group by statement. But surely that's not what I want...

Is it possible to do it in one line of SQL?

Pacerier
  • 86,231
  • 106
  • 366
  • 634
Aries
  • 2,191
  • 2
  • 15
  • 7
  • 2
    For those who are wondering, the cleanest "plain sql" solution is [the one by Joe](http://stackoverflow.com/a/14841015/632951). Second prize goes to [Claudio](http://stackoverflow.com/a/24370489/632951). – Pacerier Apr 30 '15 at 06:26

6 Answers6

224
SELECT train, dest, time FROM ( 
  SELECT train, dest, time, 
    RANK() OVER (PARTITION BY train ORDER BY time DESC) dest_rank
    FROM traintable
  ) where dest_rank = 1
Thilo
  • 257,207
  • 101
  • 511
  • 656
  • 1
    Thx Thilo :-) Indeed your answer is also correct. But as I can only accept 1 answer, I picked Oliver because I tried his answer first. – Aries Aug 16 '10 at 08:50
  • 16
    @Aries - Thilo's answer is superior to Oliver's, as Thilo's answer will tend to perform less I/O. The analytic function allows the SQL to process the the table in a single pass, whereas Oliver's solution requires multiple passes. – Adam Musch Aug 16 '10 at 14:33
  • 1
    Agreed, the GROUP BY causes an unnecessary performance hit. Using this method or even a Left Join will be far more efficient, especially with larger tables. – Joe Meyer Feb 13 '13 at 01:08
  • 1
    what's the difference of above code with the one below using row_number? can any one explain to me. SELECT train, dest, time FROM ( SELECT train, dest, time, ROW_NUMBER() OVER (PARTITION BY train ORDER BY time DESC) rn FROM traintable ) where rn = 1 – Bat_Programmer May 06 '13 at 06:11
  • @AdamMusch, Poor MySQL devs don't have luxury of `rank()` – Pacerier Apr 30 '15 at 04:17
  • @Pacerier, original question was regarding Oracle. – Tagar Jul 02 '15 at 14:19
  • 3
    @Ruslan, Yea I was pitying MySQL devs. – Pacerier Jul 04 '15 at 12:01
  • Help with similar question https://stackoverflow.com/questions/46062964/fetch-row-which-has-the-maximum-value-for-a-column-minimum-value-of-the-same-c – user104309 Sep 05 '17 at 20:53
  • I tried this with example employee database in PostgresQL and the original query performed better. Don't know why – iamareebjamal Jul 12 '18 at 09:29
  • This answer lead me in the correct direction, found my answer here [{coding} Sight](https://codingsight.com/methods-to-rank-rows-in-sql-server-rownumber-rank-denserank-and-ntile/) – zerodoc Aug 01 '19 at 14:46
  • 1
    This worked for me, but I had to give the table an alias and use it as a prefix to "dest_rank" before I could actually use it. Select From (blah) as t where t.dest_rank = 1 for example. Other than that, great answer! – Urk May 06 '20 at 13:16
  • Isn't that suboptimal to sort (aka ORDER BY) to get only the max value? Sorting is done in `O(n log (n))` while finding max is only `O(n)`. – hans Apr 27 '23 at 08:00
201

You cannot include non-aggregated columns in your result set which are not grouped. If a train has only one destination, then just add the destination column to your group by clause, otherwise you need to rethink your query.

Try:

SELECT t.Train, t.Dest, r.MaxTime
FROM (
      SELECT Train, MAX(Time) as MaxTime
      FROM TrainTable
      GROUP BY Train
) r
INNER JOIN TrainTable t
ON t.Train = r.Train AND t.Time = r.MaxTime
Pacerier
  • 86,231
  • 106
  • 366
  • 634
Oliver Hanappi
  • 12,046
  • 7
  • 51
  • 68
  • 16
    Careful, this will not work if there are "ties" for max(time) because you'll be getting multiple rows. You need to add `group by train,dest` right behind the whole query above. – Pacerier Apr 30 '15 at 06:13
  • Ok. But how does this work if the inner query is joined to another table? Let's pretend that Destination in the TrainTable has it's own table. So the inner query would look like: select t.train, d.dest, max(time) from TrainTable t join Destination d on t.destid = d.id group by t.train, d.dest – not_ur_avg_cookie Jul 12 '18 at 20:00
99

Here's an example that only uses a Left join and I believe is more efficient than any group by method out there: ExchangeCore Blog

SELECT t1.*
FROM TrainTable t1 LEFT JOIN TrainTable t2
ON (t1.Train = t2.Train AND t1.Time < t2.Time)
WHERE t2.Time IS NULL;
Community
  • 1
  • 1
Joe Meyer
  • 4,315
  • 20
  • 28
  • 9
    I like this approach because it uses just standard SQL and works really fine and fast. – GreenTurtle Apr 17 '13 at 13:05
  • Wonderful solution. Much better than the complicated group by clauses had in my mind. I just love cleverly elegent things like this, thanks! – Ran Sagy Jan 13 '15 at 18:41
  • 18
    It's eye-opening how many people keep saying this solution is "wonderful" and "the best" yet none had **tried** it. Simply put, **it doesn't work.** The correct query is: `select t1.* from TrainTable t1 left join TrainTable t2 on (t1.Train= t2.Train and t1.Time < t2.Time) where t2.Time is null`. – Pacerier Apr 30 '15 at 05:29
  • 1
    @Pacerier you're right, looking at the results the op was expecting that is the correct sql (although the concept was the same). I've updated my answer. – Joe Meyer Apr 30 '15 at 12:34
  • As was mentioned by others, there will be multiple records in the result if there are "ties" (not the case with the timetable probably, but possible in many other cases). – mp31415 Oct 16 '15 at 22:26
  • 7
    Interesting, but I've checked on my MS SQL Server server (21000 records) and this 3 times slower than MAX + GROUP BY – CoperNick Mar 18 '16 at 15:02
  • 5
    VERY slow! The currency converter table with fields date,from,to,rate. 203161 rows total. Joe Meyer method gives 362 rows in set (31,29 sec). Oliver Hanappi method gives 362 rows in set (0,04 sec) – TheRoSS Dec 05 '17 at 13:24
  • The method causes large rows should be examined. – John Huang Jan 30 '18 at 07:05
  • I was able to use this answer in Visual Studios and with a little tweaking... adding another left join for another table, and replacing the .* with the columns I really wanted, I was able to get the results I needed. I was unable to get either of the other 2 top rated answers to work at all in visual studios – user5419026 Sep 05 '19 at 03:02
  • 1
    This only works if there are exactly 2 rows with the same Train value. It selects the top N-1, where N is the number of rows with the same Train value. For example, if there are 3 rows with same Train value, it selects the top 2, if there are 4, it selects the top 3 – kane Sep 15 '20 at 04:28
16

Another solution:

select * from traintable
where (train, time) in (select train, max(time) from traintable group by train);
Claudio Negri
  • 179
  • 1
  • 4
  • 3
    Careful, this will not work if there are "ties" for max(time) because you'll be getting multiple rows. Use this instead: `select * from traintable where (train, time) in (select train, max(time) from traintable group by train) group by train,dest; ` – Pacerier Apr 30 '15 at 06:03
9

As long as there are no duplicates (and trains tend to only arrive at one station at a time)...

select Train, MAX(Time),
      max(Dest) keep (DENSE_RANK LAST ORDER BY Time) max_keep
from TrainTable
GROUP BY Train;
Gary Myers
  • 34,963
  • 3
  • 49
  • 74
5

I know I'm late to the party, but try this...

SELECT 
    `Train`, 
    `Dest`,
    SUBSTRING_INDEX(GROUP_CONCAT(`Time` ORDER BY `Time` DESC), ",", 1) AS `Time`
FROM TrainTable
GROUP BY Train;

Src: Group Concat Documentation

Edit: fixed sql syntax

CoKe
  • 639
  • 5
  • 13
Gravy
  • 12,264
  • 26
  • 124
  • 193