36

I have a table where each ID is repeated 3 times. there is a date in front of each id in each row.
I want to select entire row for each ID where date is latest. There are total 370 columns in this table i want all columns to get selected when i select that row.

Sample -

ID   Name    Date        Marks    ..    ..  ..   
1     XY     4/3/2017     27
1     fv     4/3/2014     98
1     jk     4/3/2016     09
2     RF     4/12/2015    87
2     kk     4/3/2009     56
2     PP     4/3/2011     76
3     ee     4/3/2001     12
3     ppp    4/3/2003     09
3     lll    4/3/2011     23

The Answer should be

ID   Name    Date        Marks    ..    ..  ..   
1     XY     4/3/2017     27      
2     RF     4/12/2015    87
3     lll    4/3/2011     23  

I am attempting as below -

select distinct ID,*,max(date) as maxdate from table

Also i am trying this in Hive . so not sure if some sql functions dont work in Hive

Thanks

Earthshaker
  • 549
  • 1
  • 7
  • 12
  • similar question has been answered here- https://stackoverflow.com/questions/13523049/hive-sql-find-the-latest-record/43528852#43528852 – Rahul Sharma Jul 29 '17 at 01:34

6 Answers6

87

This question has been asked before. Please see this question.

Using the accepted answer and adapting it to your problem you get:

SELECT tt.*
FROM myTable tt
INNER JOIN
    (SELECT ID, MAX(Date) AS MaxDateTime
    FROM myTable
    GROUP BY ID) groupedtt 
ON tt.ID = groupedtt.ID 
AND tt.Date = groupedtt.MaxDateTime
ecarlin
  • 1,163
  • 8
  • 8
8

One way is:

select table.* 
from table
join 
(
    select ID, max(Date) as max_dt 
    from table
    group by ID
) t
on table.ID= t.ID and table.Date = t.max_dt 

Note that if you have multiple equally higher dates for same ID, then you will get all those rows in result

Oto Shavadze
  • 40,603
  • 55
  • 152
  • 236
  • yes ..exactly thats' how it is happening ... then i think i can just remove the duplicate rows from entire database ... then those multiple rows will go away and for each id there will be only one row. – Earthshaker Jul 28 '17 at 20:35
5

You can do this with a Correlated Subquery (That is a subquery wherein you reference a field in the main query). In this case:

SELECT * 
FROM yourtable t1
WHERE date = (SELECT max(date) from yourtable WHERE id = t1.id)

Here we give the yourtable table an alias of t1 and then use that alias in the subquery grabbing the max(date) from the same table yourtable for that id.

Strawberry
  • 33,750
  • 13
  • 40
  • 57
JNevill
  • 46,980
  • 4
  • 38
  • 63
4

You can use a join to do this

SELECT t1.* from myTable t1
LEFT OUTER JOIN myTable t2 on t2.ID=t1.ID AND t2.`Date` > t1.`Date`
WHERE t2.`Date` IS NULL;

Only rows which have the latest date for each ID with have a NULL join to t2.

RichGoldMD
  • 1,252
  • 1
  • 10
  • 18
1

Here's one way. The inner query gets the max date for each id. Then you can join that back to your main table to get the rows that match.

select
*
from
<your table>
inner join 
(select id, max(<date col> as max_date) m
where yourtable.id = m.id
and yourtable.datecolumn = m.max_date)
Andrew
  • 8,445
  • 3
  • 28
  • 46
-2

Have you tried the following:

SELECT ID, COUNT(*), max(date)
FROM table 
GROUP BY ID;
milton
  • 57
  • 1
  • 2
    This will not return all of the fields in the table for the record with that id that has the max date. – JNevill Jul 28 '17 at 20:25