0

I have a panel dataframe (ID and time) and want to collect the recent (latest) rows for each ID. Here is the table:

df = pd.DataFrame({'ID': [1,1,2,3] , 'Year': [2018,2019,2019,2020] , 'Var1':list("abcd") , 'Var2': list("efgh")})

enter image description here

and the end result would be:

enter image description here

Roo
  • 862
  • 1
  • 10
  • 22
  • 1
    Does this answer your question? [Pandas dataframe get first row of each group](https://stackoverflow.com/questions/20067636/pandas-dataframe-get-first-row-of-each-group) – Roy2012 Jul 15 '20 at 18:29

2 Answers2

1

Use tail:

df.groupby("ID").tail(1)

The output is:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

Another alternative is to use last:

df.groupby("ID").last()
Roy2012
  • 11,755
  • 2
  • 22
  • 35
1

Use drop_duplicates:

df.sort_values('Year').drop_duplicates('ID', keep='last')

Output:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Yes, you can use the groupby method like @Roy2012 is doing. – Scott Boston Jul 16 '20 at 01:05
  • I am unsure if the `last()` method in @ROy2012 solution is based on time sorting. I wanted to make sure I am getting latest date/year. – Roo Jul 16 '20 at 20:28
  • @Roo `df.sort_values('Year').groupby("ID").last()` will ensure this. Sorting the dataframe appropriate then groupby will do this. – Scott Boston Jul 16 '20 at 20:29