I have a dataframe with ID and Date column, and need to filter the most recent dates for each id. However in this dataframe I have more than one row with the same date, and i need to keep it because the data in the other columns. The date is as object type.
When filtering using sort or keep='first' I can't make this separation work
df = pd.DataFrame({'id': [1,1,1,1,1,2,2,2,2,2],
'info':['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'],
'date': ['2023-01-09', '2023-01-09', '2023-01-01', '2023-01-01',
'2023-01-06', '2023-02-02','2023-01-10', '2023-02-02',
'2023-01-10', '2022-12-20']})
df.sort_values(by=['date']).drop_duplicates(subset='id', keep='first')
id info date
1 a 2023-01-09
1 b 2023-01-09
1 c 2023-01-01
1 d 2023-01-01
1 e 2023-01-06
2 f 2023-02-02
2 g 2023-01-10
2 h 2023-02-02
2 i 2023-01-10
2 j 2022-12-20
id int64
info object
date object
dtype: object
the result we need:
id info date
1 a 2023-01-09
1 b 2023-01-09
2 f 2023-02-02
2 h 2023-02-02