pandas groupby latest observation for each group

Question

I have a panel dataframe (ID and time) and want to collect the recent (latest) rows for each ID. Here is the table:

df = pd.DataFrame({'ID': [1,1,2,3] , 'Year': [2018,2019,2019,2020] , 'Var1':list("abcd") , 'Var2': list("efgh")})

and the end result would be:

Does this answer your question? [Pandas dataframe get first row of each group](https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group) — Roy2012, Jul 15 '20 at 18:29

score 1 · Answer 1 · answered Jul 15 '20 at 18:26

1

Use tail:

df.groupby("ID").tail(1)

The output is:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

Another alternative is to use last:

df.groupby("ID").last()

answered Jul 15 '20 at 18:26

Roy2012

score 1 · Accepted Answer · answered Jul 15 '20 at 18:27

1

Use drop_duplicates:

df.sort_values('Year').drop_duplicates('ID', keep='last')

Output:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

answered Jul 15 '20 at 18:27

Scott Boston

Yes, you can use the groupby method like @Roy2012 is doing. – Scott Boston Jul 16 '20 at 01:05
I am unsure if the `last()` method in @ROy2012 solution is based on time sorting. I wanted to make sure I am getting latest date/year. – Roo Jul 16 '20 at 20:28
@Roo `df.sort_values('Year').groupby("ID").last()` will ensure this. Sorting the dataframe appropriate then groupby will do this. – Scott Boston Jul 16 '20 at 20:29

2 Answers2