Getting records based on latest date

Question

I'm quite new to SQL and I'm trying to filter the latest date record (DateTime column) for each unique ID present in the table.

Sample data: there are 2 unique IDs (16512) and (76513).

DateTime	ID	Notes
2021-03-26T10:39:54.9770238	16512	Still a work in Progress
2021-04-29T12:46:12.8277807	16512	Still working on it
2021-03-21T10:39:54.9770238	76513	Still a work in Progress
2021-04-20T12:46:12.8277800	76513	Still working on project

Desired result (get last row of each ID based on the DateTime column):

DateTime	ID	Notes
2021-04-29T12:46:12.8277807	16512	Still working on it
2021-04-20T12:46:12.8277800	76513	Still working on project

My query:

SELECT MAX(DateTime), ID
FROM Table1
GROUP BY DateTime, ID

Thanks in advance for you help.

score 0 · Answer 1 · answered Mar 27 '22 at 17:52

0

SELECT max(DateTime), ID
FROM Table1
GROUP BY ID

answered Mar 27 '22 at 17:52

Shmiel

1,201
10
25

score 0 · Answer 2 · answered Mar 27 '22 at 18:08

0

You can use row_number here

with d as (
  select *, row_number() over(partition by Id order by DataTime desc)rn
)
select Datetime, Id, Notes
from d
where rn = 1;

answered Mar 27 '22 at 18:08

Stu

30,392
6
14
33

score 0 · Answer 3 · answered Mar 28 '22 at 03:19

You didn't state a particular database but if you are using Postgres then you can use its DISTINCT ON and is often the fastest solution if the size of your groups is not too big (in your case this is the size of tasks that have the same id).

Here's an example. Please note I've excluded your notes column for brevity but it will work if you include it and will give you the output you desire above.

create temporary table tasks (
  id int,
  created_at date,
);

insert into tasks(id, created_at) values
(16512, '2021-03-26'), 
(16512, '2021-04-29'),
(76513, '2021-03-21'), 
(76513, '2021-04-20')
;


select 
         distinct on (id)
         id, 
       created_at
from tasks
order by id, created_at desc

/*
  id   | created_at
-------+------------
 16512 | 2021-04-29
 76513 | 2021-04-20
*/

score 0 · Answer 4 · answered Mar 28 '22 at 07:13

The mentioned row_number is one of the method solving your problem. You tagged databricks in your question, so let me show you another option that you can implement with Spark SQL using last function from aggregate functions pool.

In refrence to the spark documentation: last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Note that:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

In your example:

%sql

WITH cte AS (
 SELECT * 
 FROM my_table
 ORDER BY DateTime asc
)
SELECT Id, last(DateTime) AS DateTime, last(Notes) as Notes
FROM cte
GROUP BY Id

Similarly, you can use first function to obtain the first record in a sorted dataset.

Check if that works for you.

Getting records based on latest date

4 Answers4