664

Table:

UserId, Value, Date.

I want to get the UserId, Value for the max(Date) for each UserId. That is, the Value for each UserId that has the latest date.

How do I do this in SQL? (Preferably Oracle.)

I need to get ALL the UserIds. But for each UserId, only that row where that user has the latest date.

philipxy
  • 14,867
  • 6
  • 39
  • 83
Umang
  • 6,675
  • 3
  • 18
  • 6

35 Answers35

509

I see many people use subqueries or else window functions to do this, but I often do this kind of query without subqueries in the following way. It uses plain, standard SQL so it should work in any brand of RDBMS.

SELECT t1.*
FROM mytable t1
  LEFT OUTER JOIN mytable t2
    ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date")
WHERE t2.UserId IS NULL;

In other words: fetch the row from t1 where no other row exists with the same UserId and a greater Date.

(I put the identifier "Date" in delimiters because it's an SQL reserved word.)

In case if t1."Date" = t2."Date", doubling appears. Usually tables has auto_inc(seq) key, e.g. id. To avoid doubling can be used follows:

SELECT t1.*
FROM mytable t1
  LEFT OUTER JOIN mytable t2
    ON t1.UserId = t2.UserId AND ((t1."Date" < t2."Date") 
         OR (t1."Date" = t2."Date" AND t1.id < t2.id))
WHERE t2.UserId IS NULL;

Re comment from @Farhan:

Here's a more detailed explanation:

An outer join attempts to join t1 with t2. By default, all results of t1 are returned, and if there is a match in t2, it is also returned. If there is no match in t2 for a given row of t1, then the query still returns the row of t1, and uses NULL as a placeholder for all of t2's columns. That's just how outer joins work in general.

The trick in this query is to design the join's matching condition such that t2 must match the same userid, and a greater date. The idea being if a row exists in t2 that has a greater date, then the row in t1 it's compared against can't be the greatest date for that userid. But if there is no match -- i.e. if no row exists in t2 with a greater date than the row in t1 -- we know that the row in t1 was the row with the greatest date for the given userid.

In those cases (when there's no match), the columns of t2 will be NULL -- even the columns specified in the join condition. So that's why we use WHERE t2.UserId IS NULL, because we're searching for the cases where no row was found with a greater date for the given userid.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 15
    Wow Bill. This is the most creative solution to this problem I've seen. It is pretty performant too on my fairly large data set. This sure beats many of the other solutions I've seen or my own attempts at solving this quandary. – Justin Noel Jan 13 '11 at 02:07
  • 44
    When applied to a table having 8.8 million rows, this query took almost twice as long as that in the accepted answer. – Derek Mahar Apr 15 '11 at 23:11
  • 20
    @Derek: Optimizations depend on the brand and version of RDBMS, as well as presence of appropriate indexes, data types, etc. – Bill Karwin Apr 19 '11 at 17:30
  • Bill, I ran my test on an Oracle 10 database server (tag on question assumes Oracle) with index on a column analagous to `UserId` and a compound index that includes a column analagous to `Date`. Perhaps the query would take less time with an index that includes only `Date`. – Derek Mahar Apr 19 '11 at 17:56
  • 12
    On MySQL, this kind of query appears to actually cause it to loop over the result of a Cartesian join between the tables, resulting in O(n^2) time. Using the subquery method instead reduced the query time from 2.0s to 0.003s. YMMV. – Jesse Feb 22 '12 at 06:22
  • 1
    @Jesse: on MySQL, *all* joins are nested-loop joins. If you have an index on (UserId,`Date`) in this case, you should be able to achieve an index-only join and speed it up a great deal. – Bill Karwin Feb 28 '12 at 17:36
  • 1
    Is there a way to adapt this to match rows where date is the greatest date less than or equal to a user given date? For example if the user gives the date "23-OCT-2011", and the table includes rows for "24-OCT-2011", "22-OCT-2011", "20-OCT-2011", then I want to get "22-OCT-2011". Been scratching my head and reading this snippet for a while now... – Cory Kendall Mar 17 '12 at 08:18
  • 1
    @CoryKendall, add conditions for *both* t1 and t2 to the join condition: `AND t1.Date <= '2011-10-23' AND t2.Date <= '2011-10-23'` in addition to the other join conditions I have shown above. – Bill Karwin Mar 17 '12 at 16:59
  • Replace table AS t1 by table t1 to make it work on all DBMSs, including Oracle (fails with AS). – Axel Fontaine Jan 15 '13 at 13:51
  • @BillKarwin *"add conditions for both t1 and t2 to the join condition"* -- This doesn't seem to work (incorrect results)! What I did instead was use the subquery modularization: `WITH subq AS (SELECT * FROM mytable WHERE "Date" <= '2011-10-23') SELECT t1.* FROM subq t1 LEFT OUTER JOIN subq t2 ON ( [...]` This works because only filtered data is provided as input to the *left outer join*. It also has the added advantage of providing the condition only once. – ADTC Jan 16 '14 at 06:46
  • @ADTC, good solution! I work with MySQL more frequently, and MySQL doesn't support `WITH` expressions yet. – Bill Karwin Jan 16 '14 at 16:44
  • That's really sad because the main problem with SQL is the lack of modularization, but the `WITH` construct somehow eases the pain by providing a basic layer of modularization. It should really be a standard SQL (if it's not already). Btw, your original proposal did not seem to give the correct results in Postgres. Does it give the correct results in MySQL? – ADTC Jan 16 '14 at 17:45
  • @ADTC, yes, the `WITH` construct is part of SQL:2003. MySQL development has focused for the last ~5 years focusing on improving performance and scalability by changing code deep in the storage engines, but they have done less work adding SQL features. – Bill Karwin Jan 16 '14 at 17:50
  • 1
    @DavidMann, it's frequently called an [exclusion join](http://www.xaprb.com/blog/2005/09/23/how-to-write-a-sql-exclusion-join/). – Bill Karwin Jun 25 '14 at 17:48
  • 1
    @BillKarwin Ah sure, the outer join is an exclusion join. I guess I meant to ask if there was a name for the approach of using an exclusion join with some condition that lets one solve a 'greatest-n-per-gorup' problem – David Mann Jun 25 '14 at 18:57
  • @DavidMann, oh, I don't know if this has a particular pattern name. – Bill Karwin Jun 25 '14 at 18:57
  • I'm sorry but why doesn't this return NULL for cases where t1.date > t2.date? – danihodovic Feb 21 '15 at 18:02
  • @dani-h, if t1.date > t2.date, and there are only two rows, then yes of course t2.* would return NULL. But t2 could be any row with the same userid. If t2 matches even one row with a greater date, then t2.* will return non-NULL. Only if t1 has a greater date than *all* rows matched by t2, does t2.* return NULL. Does that help? – Bill Karwin Feb 21 '15 at 18:35
  • @BillKarwin Thanks for attempting to explain this, but I think you've confused me even more :]. A left join is a similar to a cartesian join, yes? Meaning that all rows in t1 are mixed with all rows in t2, where the id matches. If t2.date > t1.date it returns the row in t1 joined by the row in t2. If t1.date > t2.date then there is no match on the right hand side, shouldn't it return NULL for these values as well? – danihodovic Feb 21 '15 at 21:14
  • @dani-h, Suppose you have three rows: January 1, February 1, and March1. Suppose `t1` points to February 1. You join `t1` to the *set* of rows with a greater date, and call it `t2`. The first row (January 1) is not greater, so it is not in that set. Does the join therefore return NULL? No -- because the third row (March 1) is greater than `t1` and is in the set of `t2`. Therefore `t1` referencing February 1 is *not* the row with the greatest date. Only when `t1` references March 1, and no row is found that is greater, does `t2` return NULLs, and `t1` is the greatest. – Bill Karwin Feb 22 '15 at 09:29
  • 1
    @BillKarwin. I am newbie to SQL. Trying to understand the solution. I was wondering why do we need a **WHERE** clause. Can't we put the where condition directly in the `on` clause. i.e ON (t1.UserId = t2.UserId AND t1."Date" < t2."Date" **AND t2.UserId IS NULL**). can you please explain? – frank Sep 07 '15 at 16:06
  • 2
    @frank, because t2.UserId is not null until after the outer join has been evaluated. Please study about outer joins. – Bill Karwin Sep 07 '15 at 18:01
  • This performs terribly on some RDBMSs, but I upvoted it anyway because it's a fresh and awesome way to think about the problem! – Jon Marnock Jun 06 '16 at 01:53
  • @JonKloske since answering this question in 2008, I have found the performance has a lot to do with the data. I.e. how many rows per distinct UserId. Anyway, it's almost always a better solution than correlated subqueries. – Bill Karwin Jun 06 '16 at 03:37
  • 1
    yep, very much depends on how easy it is to join with an index, too. If for example you have datetime log data and you're grouping by date(datetime), in mysql at least its not indexable so it's O(n^2), which is worse than some subquery approaches, but as they're all terrible for large rowcounts anyway it doesn't matter much practically. And obviously that's not oracle, though I haven't tested that, maybe that case is bad there too. – Jon Marnock Jun 07 '16 at 21:53
  • (I found a very quick O(n) solution for that case in mysql that I haven't seen anywhere on SO for those type of questions that also works generally for any type of 'select max or min row' query that also makes it easy to pluck out both in the same row at no extra cost, but er, to paraphrase Fermat, the details are too big to fit in this margin!!!) – Jon Marnock Jun 07 '16 at 21:57
  • "*t uses plain, standard SQL*" - window functions **are** standard SQL and are not "vendor specific". They have been part of the SQL standard since 2003 –  Aug 30 '16 at 08:43
  • @a_horse_with_no_name - perhaps the sentence should say *widely supported* standard SQL, since MySQL did not support window functions until 8.0.2 in 2018. (And sadly, some of us are stuck on legacy implementations that haven't upgraded to 8...) – ToolmakerSteve Jul 08 '20 at 00:42
  • I edited the answer to say "window functions" instead of "vendor specific features". – Bill Karwin Jul 08 '20 at 01:15
  • Yes.@BillKarwin It works as expected. But How I query it in randomly?. – Leang Socheat Aug 11 '20 at 08:31
  • @LeangSocheat That sounds like a new question. – Bill Karwin Aug 11 '20 at 13:44
  • This performs much faster than the accepted answer, provided indexes can be used. – Calin May 05 '21 at 14:45
  • the query is cool but how do you avoid doubling if the field Date is not unique and you don't have a secondary ID field in the table? – Andrea Mauro Mar 02 '23 at 16:58
  • @AndreaMauro Use a window function solution. – Bill Karwin Mar 02 '23 at 17:09
  • Another explanation: All rows of `t1` must be considered. For every `id` in `t1`, there is an `id` in `t2` with a greater date, except one. – Hritik May 15 '23 at 19:45
455

This will retrieve all rows for which the my_date column value is equal to the maximum value of my_date for that userid. This may retrieve multiple rows for the userid where the maximum date is on multiple rows.

select userid,
       my_date,
       ...
from
(
select userid,
       my_date,
       ...
       max(my_date) over (partition by userid) max_my_date
from   users
)
where my_date = max_my_date

"Analytic functions rock"

Edit: With regard to the first comment ...

"using analytic queries and a self-join defeats the purpose of analytic queries"

There is no self-join in this code. There is instead a predicate placed on the result of the inline view that contains the analytic function -- a very different matter, and completely standard practice.

"The default window in Oracle is from the first row in the partition to the current one"

The windowing clause is only applicable in the presence of the order by clause. With no order by clause, no windowing clause is applied by default and none can be explicitly specified.

The code works.

รยקคгรђשค
  • 1,919
  • 1
  • 10
  • 18
David Aldridge
  • 51,479
  • 8
  • 68
  • 96
  • 50
    When applied to a table having 8.8 million rows, this query took half the time of the queries in some the other highly voted answers. – Derek Mahar Apr 15 '11 at 23:59
  • What indexes should I use to make this query (specifically this query) go faster? I'm a bit over my head and using Oracle right now. In a table with 5.5 million rows, this call isn't returning over 30 seconds, and I was hoping for ~100ms or less on this call. – Cory Kendall Mar 18 '12 at 00:38
  • I think you'd have to use an combined index over userid and my_date, so the database can completely use the index to get you the results fast and only read the relevant rows – Falco May 06 '14 at 14:14
  • 5
    Anyone care to post a link to the MySQL equivalent of this, if there is one? – redolent Jan 10 '15 at 02:35
  • 3
    Couldn't this return duplicates? Eg. if two rows have the same user_id and the same date (which happens to be the max). – jastr Jun 15 '16 at 19:30
  • 3
    @jastr I think that was acknowledged in the question – David Aldridge Jun 17 '16 at 15:47
  • @DavidAldridge Are you referring to "That column is likely unique"? – jastr Jun 20 '16 at 17:21
  • 9
    Instead of `MAX(...) OVER (...)` you can also use `ROW_NUMBER() OVER (...)` (for the top-n-per-group) or `RANK() OVER (...)` (for the greatest-n-per-group). – MT0 Jun 27 '16 at 08:13
  • Is there a way to run the inner query without having to display the max(value) ? I am in the case where I don't have a where clause (I want all matching rows and no duplicates can be), but I would prefer to not display the max value. – Mat M Feb 14 '18 at 13:56
  • @MT0 : isn't "top-n-per-group" or "greatest-n-per-group" the same as "max per group"?? And could you please provide a short example?? Thanks in advance. – mnemotronic Oct 26 '20 at 13:35
  • 2
    @mnemotronic Filtering on `ROW_NUMBER() OVER ( ... )` then for each partition you get exactly `n` rows. Filtering on `RANK() OVER ( ... )` then you get rows for the `n` top values and may be more than `n` rows if there are ties. Using `DENSE_RANK` gives the `n` top unique values which will be more than `n` rows if there are ties. Filtering on `MAX( ... ) OVER (...)` is the same as filtering on `RANK` (or `DENSE_RANK`) and restricting to the first rank only and it may or may not be the same as `ROW_NUMBER() OVER (...)` depending on whether the rows in the `ORDER BY` clause are unique or not. – MT0 Oct 26 '20 at 13:44
  • 2
    @mnemotronic An example of the differences is [db<>fiddle](https://dbfiddle.uk/?rdbms=oracle_18&fiddle=e887d1c0cb4885356c2e99d95a2617bb). – MT0 Oct 26 '20 at 14:14
  • The question mentions "... for each UserId". This will only return userId/value pairs for the users that have an entry where my_date = max_my_date, but not the rest. Right? – Chris Oct 30 '20 at 15:33
  • @MT0: Wow. Very enlightening! Thanks! It'll take me some time to wrap my head around it. – mnemotronic Feb 08 '21 at 16:06
  • @MT0 Thanks for this example, I changed to MySQL: https://dbfiddle.uk/Ig2x5eN0. – jdhao Sep 21 '22 at 07:14
170
SELECT userid, MAX(value) KEEP (DENSE_RANK FIRST ORDER BY date DESC)
  FROM table
  GROUP BY userid
Dave Costa
  • 47,262
  • 8
  • 56
  • 72
  • 5
    In my tests using a table having a large number of rows, this solution took about twice as long as that in the accepted answer. – Derek Mahar Apr 15 '11 at 23:16
  • I confirm it's much faster than other solutions – tamersalama Sep 12 '12 at 01:02
  • 5
    trouble is it does not return the full record – Paul Maxwell Aug 07 '14 at 07:03
  • @user2067753 No, it doesn't return the full record. You can use the same MAX()..KEEP.. expression on multiple columns, so you can select all the columns you need. But it is inconvenient if you want a large number of columns and would prefer to use SELECT *. – Dave Costa Aug 07 '14 at 19:54
59

I don't know your exact columns names, but it would be something like this:

SELECT userid, value
FROM users u1
WHERE date = (
    SELECT MAX(date)
    FROM users u2
    WHERE u1.userid = u2.userid
)
jdhao
  • 24,001
  • 18
  • 134
  • 273
Steve K
  • 19,408
  • 6
  • 52
  • 50
  • 5
    Probably not very efficent, Steve. – David Aldridge Sep 23 '08 at 14:43
  • 7
    You are probably underestimating the Oracle query optimizer. – Rafał Dowgird Sep 23 '08 at 14:57
  • 3
    Not at all. This will almost certainly be implemented as a full scan with a nested loop join to get the dates. You're talking about logical io's in the order of 4 times the number of rows in the table and be dreadful for non-trivial amounts of data. – David Aldridge Sep 23 '08 at 15:02
  • Not efficent, but works. Mine as well: http://stackoverflow.com/questions/121387/sql-fetch-the-row-which-has-the-max-value-for-a-column#121556 – Zsolt Botykai Sep 23 '08 at 15:07
  • What about using analytical sql extensions for Oracle? – Mike McAllister Sep 23 '08 at 15:14
  • 2
    My analytic solution got voted down for some reason. No idea why -- it's almost the gold standard of how to tackle these problems in Oracle now. – David Aldridge Sep 23 '08 at 15:42
  • 6
    FYI, "Not efficient, but works" is the same as "Works, but is not efficient". When did we give up on efficient as a design goal? – David Aldridge Sep 23 '08 at 15:43
  • I voted your analytic query solution down because it was wrong. While efficient is a design goal, it comes after correctness. See my analytic solution instead. – user11318 Sep 23 '08 at 15:53
  • 2
    Your comments on my analytic method are incorrect. See my edit. – David Aldridge Sep 23 '08 at 17:58
  • 2
    +1 This performed the fastest for me, of all the solutions given. Bill's was second, but perhaps as he suggested this could change due to other factors. – LittleTreeX Sep 30 '11 at 19:48
  • 11
    +1 because when your datatables are not millions of rows in length anwyays, this is the most easily understood solution. when you have multiple developers of all skill levels modifying the code, understandability is more important then a fraction of a second in performance that is unnoticable. – n00b Apr 24 '13 at 17:59
  • 3
    Tested on Apache Derby with 6 million rows: this solution is the fastest if and only if you have the following index defined: "create unique index MYINDEX on USERS(USERID,DATE desc)", otherwise it's O(n^2) deadly slow. – Unai Vivi Nov 08 '13 at 17:01
  • 1
    This is called a 'correlated subquery' I believe. – David Mann Jun 25 '14 at 15:51
45

Not being at work, I don't have Oracle to hand, but I seem to recall that Oracle allows multiple columns to be matched in an IN clause, which should at least avoid the options that use a correlated subquery, which is seldom a good idea.

Something like this, perhaps (can't remember if the column list should be parenthesised or not):

SELECT * 
FROM MyTable
WHERE (User, Date) IN
  ( SELECT User, MAX(Date) FROM MyTable GROUP BY User)

EDIT: Just tried it for real:

SQL> create table MyTable (usr char(1), dt date);
SQL> insert into mytable values ('A','01-JAN-2009');
SQL> insert into mytable values ('B','01-JAN-2009');
SQL> insert into mytable values ('A', '31-DEC-2008');
SQL> insert into mytable values ('B', '31-DEC-2008');
SQL> select usr, dt from mytable
  2  where (usr, dt) in 
  3  ( select usr, max(dt) from mytable group by usr)
  4  /

U DT
- ---------
A 01-JAN-09
B 01-JAN-09

So it works, although some of the new-fangly stuff mentioned elsewhere may be more performant.

Mike Woodhouse
  • 51,832
  • 12
  • 88
  • 127
  • 7
    This works nicely on PostgreSQL too. And I like the simplicity and generality of it -- the subquery says "Here's my criteria", the outer query says "And here's the details I want to see". +1. – j_random_hacker Jun 15 '10 at 06:00
15

I know you asked for Oracle, but in SQL 2005 we now use this:


-- Single Value
;WITH ByDate
AS (
SELECT UserId, Value, ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) RowNum
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE RowNum = 1

-- Multiple values where dates match
;WITH ByDate
AS (
SELECT UserId, Value, RANK() OVER (PARTITION BY UserId ORDER BY Date DESC) Rnk
FROM UserDates
)
SELECT UserId, Value
FROM ByDate
WHERE Rnk = 1
mancaus
  • 2,983
  • 1
  • 20
  • 18
8

I don't have Oracle to test it, but the most efficient solution is to use analytic queries. It should look something like this:

SELECT DISTINCT
    UserId
  , MaxValue
FROM (
    SELECT UserId
      , FIRST (Value) Over (
          PARTITION BY UserId
          ORDER BY Date DESC
        ) MaxValue
    FROM SomeTable
  )

I suspect that you can get rid of the outer query and put distinct on the inner, but I'm not sure. In the meantime I know this one works.

If you want to learn about analytic queries, I'd suggest reading http://www.orafaq.com/node/55 and http://www.akadia.com/services/ora_analytic_functions.html. Here is the short summary.

Under the hood analytic queries sort the whole dataset, then process it sequentially. As you process it you partition the dataset according to certain criteria, and then for each row looks at some window (defaults to the first value in the partition to the current row - that default is also the most efficient) and can compute values using a number of analytic functions (the list of which is very similar to the aggregate functions).

In this case here is what the inner query does. The whole dataset is sorted by UserId then Date DESC. Then it processes it in one pass. For each row you return the UserId and the first Date seen for that UserId (since dates are sorted DESC, that's the max date). This gives you your answer with duplicated rows. Then the outer DISTINCT squashes duplicates.

This is not a particularly spectacular example of analytic queries. For a much bigger win consider taking a table of financial receipts and calculating for each user and receipt, a running total of what they paid. Analytic queries solve that efficiently. Other solutions are less efficient. Which is why they are part of the 2003 SQL standard. (Unfortunately Postgres doesn't have them yet. Grrr...)

Andrew Barber
  • 39,603
  • 20
  • 94
  • 123
user11318
  • 9,273
  • 2
  • 25
  • 25
  • You also need to return the date value to answer the question completely. If that means another first_value clause then I'd suggest that the solution is more complex than it ought to be, and the analytic method based on max(date) reads better. – David Aldridge Sep 23 '08 at 18:01
  • The question statement says nothing about returning the date. You can do that either by adding another FIRST(Date) or else just by querying the Date and changing the outer query to a GROUP BY. I'd use the first and expect the optimizer to calculate both in one pass. – user11318 Sep 23 '08 at 18:11
  • "The question statement says nothing about returning the date" ... yes, you're right. Sorry. But adding more FIRST_VALUE clauses would become messy pretty quickly. It's a single window sort, but if you had 20 columns to return for that row then you've written a lot of code to wade through. – David Aldridge Sep 23 '08 at 18:18
  • It also occurs to me that this solution is non-deterministic for data where a single userid has multiple rows that have the maximum date and different VALUEs. More a fault in the question than the answer though. – David Aldridge Sep 23 '08 at 18:22
  • 1
    I agree it is painfully verbose. However isn't that generally the case with SQL? And you're right that the solution is non-deterministic. There are multiple ways to deal with ties, and sometimes each is what you want. – user11318 Sep 23 '08 at 19:51
7

In Oracle 12c+, you can use Top n queries along with analytic function rank to achieve this very concisely without subqueries:

select *
from your_table
order by rank() over (partition by user_id order by my_date desc)
fetch first 1 row with ties;

The above returns all the rows with max my_date per user.

If you want only one row with max date, then replace the rank with row_number:

select *
from your_table
order by row_number() over (partition by user_id order by my_date desc)
fetch first 1 row with ties; 
Gurwinder Singh
  • 38,557
  • 6
  • 51
  • 76
7

Wouldn't a QUALIFY clause be both simplest and best?

select userid, my_date, ...
from users
qualify rank() over (partition by userid order by my_date desc) = 1

For context, on Teradata here a decent size test of this runs in 17s with this QUALIFY version and in 23s with the 'inline view'/Aldridge solution #1.

wcw
  • 71
  • 1
  • 1
  • 1
    This is the best answer in my opinion. However, be careful with the `rank()` function in situations where there are ties. You could end up with more than one `rank=1`. Better to use `row_number()` if you really do want just one record returned. – cartbeforehorse May 26 '12 at 13:18
  • 3
    Also, be aware that the `QUALIFY` clause is specific to Teradata. In Oracle (at least) you have to nest your query and filter using a `WHERE` clause on the wrapping select statement (which probably hits performance a touch, I'd imagine). – cartbeforehorse May 26 '12 at 13:40
7

With PostgreSQL 8.4 or later, you can use this:

select user_id, user_value_1, user_value_2
  from (select user_id, user_value_1, user_value_2, row_number()
          over (partition by user_id order by user_date desc) 
        from users) as r
  where r.row_number=1
Cito
  • 5,365
  • 28
  • 30
  • 2
    For PostgreSQL we can now use `DISTINCT ON` which perfectly addresses the use case: https://stackoverflow.com/questions/586781/postgresql-fetch-the-rows-which-have-the-max-value-for-a-column-in-each-group – Ciro Santilli OurBigBook.com Jun 29 '22 at 09:53
4

I'm quite late to the party but the following hack will outperform both correlated subqueries and any analytics function but has one restriction: values must convert to strings. So it works for dates, numbers and other strings. The code does not look good but the execution profile is great.

select
    userid,
    to_number(substr(max(to_char(date,'yyyymmdd') || to_char(value)), 9)) as value,
    max(date) as date
from 
    users
group by
    userid

The reason why this code works so well is that it only needs to scan the table once. It does not require any indexes and most importantly it does not need to sort the table, which most analytics functions do. Indexes will help though if you need to filter the result for a single userid.

aLevelOfIndirection
  • 3,522
  • 14
  • 18
  • It is a good execution plan compared to most, but applying all those tricks to more then a few fields would be tedious and may work against it. But very interesting - thanks. see http://sqlfiddle.com/#!4/2749b5/23 – Paul Maxwell Aug 07 '14 at 07:11
  • You are right it can become tedious, which is why this should be done only when the performance of the query requires it. Such is often the case with ETL scripts. – aLevelOfIndirection Aug 13 '14 at 15:07
  • this is very nice. did something similar using LISTAGG but looks ugly. postgres has a better altenative using array_agg. see my answer :) – Bruno Calza Nov 13 '14 at 13:26
4

Just had to write a "live" example at work :)

This one supports multiple values for UserId on the same date.

Columns: UserId, Value, Date

SELECT
   DISTINCT UserId,
   MAX(Date) OVER (PARTITION BY UserId ORDER BY Date DESC),
   MAX(Values) OVER (PARTITION BY UserId ORDER BY Date DESC)
FROM
(
   SELECT UserId, Date, SUM(Value) As Values
   FROM <<table_name>>
   GROUP BY UserId, Date
)

You can use FIRST_VALUE instead of MAX and look it up in the explain plan. I didn't have the time to play with it.

Of course, if searching through huge tables, it's probably better if you use FULL hints in your query.

Truper
  • 41
  • 2
4

Use ROW_NUMBER() to assign a unique ranking on descending Date for each UserId, then filter to the first row for each UserId (i.e., ROW_NUMBER = 1).

SELECT UserId, Value, Date
FROM (SELECT UserId, Value, Date,
        ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY Date DESC) rn
      FROM users) u
WHERE rn = 1;
markusk
  • 6,477
  • 34
  • 39
3

If you're using Postgres, you can use array_agg like

SELECT userid,MAX(adate),(array_agg(value ORDER BY adate DESC))[1] as value
FROM YOURTABLE
GROUP BY userid

I'm not familiar with Oracle. This is what I came up with

SELECT 
  userid,
  MAX(adate),
  SUBSTR(
    (LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)),
    0,
    INSTR((LISTAGG(value, ',') WITHIN GROUP (ORDER BY adate DESC)), ',')-1
  ) as value 
FROM YOURTABLE
GROUP BY userid 

Both queries return the same results as the accepted answer. See SQLFiddles:

  1. Accepted answer
  2. My solution with Postgres
  3. My solution with Oracle
Bruno Calza
  • 2,732
  • 2
  • 23
  • 25
  • Thanks. Nice to know about the `array-agg` function. Hypothetically, `array-agg` may not work well for cases where there are too many rows per `userid` (the group by column) ? And, also when we need multiple select columns in the result; Then , we would need to apply `array_agg` to every other column, i.e do a `group by` with `adate` with every other select column ? Great answer for OP's question though! – Binita Bharati Sep 10 '21 at 04:44
2

I think something like this. (Forgive me for any syntax mistakes; I'm used to using HQL at this point!)

EDIT: Also misread the question! Corrected the query...

SELECT UserId, Value
FROM Users AS user
WHERE Date = (
    SELECT MAX(Date)
    FROM Users AS maxtest
    WHERE maxtest.UserId = user.UserId
)
jdmichal
  • 10,984
  • 4
  • 43
  • 42
2

i thing you shuold make this variant to previous query:

SELECT UserId, Value FROM Users U1 WHERE 
Date = ( SELECT MAX(Date)    FROM Users where UserId = U1.UserId)
stefano m
  • 4,094
  • 5
  • 28
  • 27
2
Select  
   UserID,  
   Value,  
   Date  
From  
   Table,  
   (  
      Select  
          UserID,  
          Max(Date) as MDate  
      From  
          Table  
      Group by  
          UserID  
    ) as subQuery  
Where  
   Table.UserID = subQuery.UserID and  
   Table.Date = subQuery.mDate  
Aheho
  • 12,622
  • 13
  • 54
  • 83
2
select VALUE from TABLE1 where TIME = 
   (select max(TIME) from TABLE1 where DATE= 
   (select max(DATE) from TABLE1 where CRITERIA=CRITERIA))
Justin
  • 84,773
  • 49
  • 224
  • 367
nouky
  • 65
  • 1
  • 7
1

(T-SQL) First get all the users and their maxdate. Join with the table to find the corresponding values for the users on the maxdates.

create table users (userid int , value int , date datetime)
insert into users values (1, 1, '20010101')
insert into users values (1, 2, '20020101')
insert into users values (2, 1, '20010101')
insert into users values (2, 3, '20030101')

select T1.userid, T1.value, T1.date 
    from users T1,
    (select max(date) as maxdate, userid from users group by userid) T2    
    where T1.userid= T2.userid and T1.date = T2.maxdate

results:

userid      value       date                                    
----------- ----------- -------------------------- 
2           3           2003-01-01 00:00:00.000
1           2           2002-01-01 00:00:00.000
boes
  • 2,835
  • 2
  • 23
  • 28
1

Assuming Date is unique for a given UserID, here's some TSQL:

SELECT 
    UserTest.UserID, UserTest.Value
FROM UserTest
INNER JOIN
(
    SELECT UserID, MAX(Date) MaxDate
    FROM UserTest
    GROUP BY UserID
) Dates
ON UserTest.UserID = Dates.UserID
AND UserTest.Date = Dates.MaxDate 
DarthJDG
  • 16,511
  • 11
  • 49
  • 56
marc
  • 3,248
  • 4
  • 28
  • 33
1

Solution for MySQL which doesn't have concepts of partition KEEP, DENSE_RANK.

select userid,
       my_date,
       ...
from
(
select @sno:= case when @pid<>userid then 0
                    else @sno+1
    end as serialnumber, 
    @pid:=userid,
       my_Date,
       ...
from   users order by userid, my_date
) a
where a.serialnumber=0

Reference: http://benincampus.blogspot.com/2013/08/select-rows-which-have-maxmin-value-in.html

Ben Lin
  • 807
  • 10
  • 15
  • This does not work "*on other DBs too*". This only works on MySQL and possibly on SQL Server because it has a similar concept of variables. It will definitely not work on Oracle, Postgres, DB2, Derby, H2, HSQLDB, Vertica, Greenplum. Additionally the accepted answer is standard ANSI SQL (which by know only MySQL doesn't support) –  Aug 30 '13 at 18:55
  • horse, I guess you are right. I don't have knowledge about other DBs, or ANSI. My solution is able to solve the issue in MySQL, which doesn't have proper support for ANSI SQL to solve it in standard way. – Ben Lin Sep 05 '13 at 16:28
1

The answer here is Oracle only. Here's a bit more sophisticated answer in all SQL:

Who has the best overall homework result (maximum sum of homework points)?

SELECT FIRST, LAST, SUM(POINTS) AS TOTAL
FROM STUDENTS S, RESULTS R
WHERE S.SID = R.SID AND R.CAT = 'H'
GROUP BY S.SID, FIRST, LAST
HAVING SUM(POINTS) >= ALL (SELECT SUM (POINTS)
FROM RESULTS
WHERE CAT = 'H'
GROUP BY SID)

And a more difficult example, which need some explanation, for which I don't have time atm:

Give the book (ISBN and title) that is most popular in 2008, i.e., which is borrowed most often in 2008.

SELECT X.ISBN, X.title, X.loans
FROM (SELECT Book.ISBN, Book.title, count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan 
ON Copy.copyId = Loan.copyId
GROUP BY Book.title) X
HAVING loans >= ALL (SELECT count(Loan.dateTimeOut) AS loans
FROM CatalogEntry Book
LEFT JOIN BookOnShelf Copy
ON Book.bookId = Copy.bookId
LEFT JOIN (SELECT * FROM Loan WHERE YEAR(Loan.dateTimeOut) = 2008) Loan 
ON Copy.copyId = Loan.copyId
GROUP BY Book.title);

Hope this helps (anyone).. :)

Regards, Guus

Guus
  • 19
  • 1
  • The accepted answer is not "Oracle only" - it's standard SQL (supported by many DBMS) –  Dec 07 '14 at 08:08
0
select userid, value, date
  from thetable t1 ,
       ( select t2.userid, max(t2.date) date2 
           from thetable t2 
          group by t2.userid ) t3
 where t3.userid t1.userid and
       t3.date2 = t1.date

IMHO this works. HTH

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110
0

I think this should work?

Select
T1.UserId,
(Select Top 1 T2.Value From Table T2 Where T2.UserId = T1.UserId Order By Date Desc) As 'Value'
From
Table T1
Group By
T1.UserId
Order By
T1.UserId
GateKiller
  • 74,180
  • 73
  • 171
  • 204
0

This should be as simple as:

SELECT UserId, Value
FROM Users u
WHERE Date = (SELECT MAX(Date) FROM Users WHERE UserID = u.UserID)
DarthJDG
  • 16,511
  • 11
  • 49
  • 56
Valerion
  • 823
  • 8
  • 15
0

First try I misread the question, following the top answer, here is a complete example with correct results:

CREATE TABLE table_name (id int, the_value varchar(2), the_date datetime);

INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'a','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(1 ,'b','2/2/2002');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'c','1/1/2000');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'d','3/3/2003');
INSERT INTO table_name (id,the_value,the_date) VALUES(2 ,'e','3/3/2003');

--

  select id, the_value
      from table_name u1
      where the_date = (select max(the_date)
                     from table_name u2
                     where u1.id = u2.id)

--

id          the_value
----------- ---------
2           d
2           e
1           b

(3 row(s) affected)
KyleLanser
  • 2,699
  • 2
  • 21
  • 21
0

This will also take care of duplicates (return one row for each user_id):

SELECT *
FROM (
  SELECT u.*, FIRST_VALUE(u.rowid) OVER(PARTITION BY u.user_id ORDER BY u.date DESC) AS last_rowid
  FROM users u
) u2
WHERE u2.rowid = u2.last_rowid
na43251
  • 66
  • 1
  • 6
0

Just tested this and it seems to work on a logging table

select ColumnNames, max(DateColumn) from log  group by ColumnNames order by 1 desc
Mauro
  • 4,531
  • 3
  • 30
  • 56
-1

If (UserID, Date) is unique, i.e. no date appears twice for the same user then:

select TheTable.UserID, TheTable.Value
from TheTable inner join (select UserID, max([Date]) MaxDate
                          from TheTable
                          group by UserID) UserMaxDate
     on TheTable.UserID = UserMaxDate.UserID
        TheTable.[Date] = UserMaxDate.MaxDate;
finnw
  • 47,861
  • 24
  • 143
  • 221
-1
select   UserId,max(Date) over (partition by UserId) value from users;
王奕然
  • 3,891
  • 6
  • 41
  • 62
-1

check this link if your questions seems similar to that page then i would suggest you the following query which will give the solution for that link

select distinct sno,item_name,max(start_date) over(partition by sno),max(end_date) over(partition by sno),max(creation_date) over(partition by sno), max(last_modified_date) over(partition by sno) from uniq_select_records order by sno,item_name asc;

will given accurate results related to that link

Community
  • 1
  • 1
Smart003
  • 1,119
  • 2
  • 16
  • 31
-1

Use the code:

select T.UserId,T.dt from (select UserId,max(dt) 
over (partition by UserId) as dt from t_users)T where T.dt=dt;

This will retrieve the results, irrespective of duplicate values for UserId. If your UserId is unique, well it becomes more simple:

select UserId,max(dt) from t_users group by UserId;
Natty
  • 141
  • 1
  • 2
  • 14
-1
SELECT a.userid,a.values1,b.mm 
FROM table_name a,(SELECT userid,Max(date1)AS mm FROM table_name GROUP BY userid) b
WHERE a.userid=b.userid AND a.DATE1=b.mm;
praveen
  • 75
  • 1
  • 2
  • While this might answer the authors question, it lacks some explaining words and links to documentation. Raw code snippets are not very helpful without some phrases around it. You may also find [how to write a good answer](https://stackoverflow.com/help/how-to-answer) very helpful. Please edit your answer. – einverne Oct 16 '18 at 09:45
-1

Below query can work :

SELECT user_id, value, date , row_number() OVER (PARTITION BY user_id ORDER BY date desc) AS rn
FROM table_name
WHERE rn= 1
Piotr Labunski
  • 1,638
  • 4
  • 19
  • 26
kiruba
  • 129
  • 5
-2
SELECT a.* 
FROM user a INNER JOIN (SELECT userid,Max(date) AS date12 FROM user1 GROUP BY userid) b  
ON a.date=b.date12 AND a.userid=b.userid ORDER BY a.userid;
Fabrizio
  • 7,603
  • 6
  • 44
  • 104
praveen
  • 75
  • 1
  • 2