-1

I'm quite new to SQL and I'm trying to filter the latest date record (DateTime column) for each unique ID present in the table.

Sample data: there are 2 unique IDs (16512) and (76513).

DateTime ID Notes
2021-03-26T10:39:54.9770238 16512 Still a work in Progress
2021-04-29T12:46:12.8277807 16512 Still working on it
2021-03-21T10:39:54.9770238 76513 Still a work in Progress
2021-04-20T12:46:12.8277800 76513 Still working on project

Desired result (get last row of each ID based on the DateTime column):

DateTime ID Notes
2021-04-29T12:46:12.8277807 16512 Still working on it
2021-04-20T12:46:12.8277800 76513 Still working on project

My query:

SELECT MAX(DateTime), ID
FROM Table1
GROUP BY DateTime, ID

Thanks in advance for you help.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
funkd
  • 1
  • 2

4 Answers4

0
SELECT max(DateTime), ID
FROM Table1
GROUP BY ID
Shmiel
  • 1,201
  • 10
  • 25
0

You can use row_number here

with d as (
  select *, row_number() over(partition by Id order by DataTime desc)rn
)
select Datetime, Id, Notes
from d
where rn = 1;
Stu
  • 30,392
  • 6
  • 14
  • 33
0

You didn't state a particular database but if you are using Postgres then you can use its DISTINCT ON and is often the fastest solution if the size of your groups is not too big (in your case this is the size of tasks that have the same id).

Here's an example. Please note I've excluded your notes column for brevity but it will work if you include it and will give you the output you desire above.

create temporary table tasks (
  id int,
  created_at date,
);

insert into tasks(id, created_at) values
(16512, '2021-03-26'), 
(16512, '2021-04-29'),
(76513, '2021-03-21'), 
(76513, '2021-04-20')
;


select 
         distinct on (id)
         id, 
       created_at
from tasks
order by id, created_at desc

/*
  id   | created_at
-------+------------
 16512 | 2021-04-29
 76513 | 2021-04-20
*/

robodisco
  • 4,162
  • 7
  • 34
  • 48
0

The mentioned row_number is one of the method solving your problem. You tagged databricks in your question, so let me show you another option that you can implement with Spark SQL using last function from aggregate functions pool.

In refrence to the spark documentation: last(expr[, isIgnoreNull]) - Returns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.

Note that:

The function is non-deterministic because its results depends on the order of the rows which may be non-deterministic after a shuffle.

In your example:

%sql

WITH cte AS (
 SELECT * 
 FROM my_table
 ORDER BY DateTime asc
)
SELECT Id, last(DateTime) AS DateTime, last(Notes) as Notes
FROM cte
GROUP BY Id

Similarly, you can use first function to obtain the first record in a sorted dataset.

Check if that works for you.

intruderr
  • 355
  • 1
  • 7