How to get the last occurrance of all items on a column (pandas)

Question

Let's suppose I have a dataset like this:

item_id |    date    | cat |
----------------------------
   0    | 2020-01-01 |  A  |
   0    | 2020-02-01 |  B  |
   1    | 2020-04-01 |  A  |
   2    | 2020-02-01 |  C  |
   2    | 2021-01-01 |  B  |

So, I need to get the last category (column cat), that means that the result dataframe would be the following:

item_id | cat |
---------------
   0    |  B  |
   1    |  A  |
   2    |  B  |

I know I could sort the values by the date and then iterate over the itens, but that would be too much consuming. Is there another method on pandas to achieve that?

Corralien · Accepted Answer · 2023-02-02T17:29:01.450

3

Use drop_duplicates after sort_values:

>>> df.sort_values('date').drop_duplicates('item_id', keep='last')
   item_id        date cat
1        0  2020-02-01   B
2        1  2020-04-01   A
4        2  2021-01-01   B

Comment by @mozway:

Sorting is O(n*logn)

>>> df.loc[pd.to_datetime(df['date']).groupby(df['item_id'], sort=False).idxmax()]

   item_id        date cat
1        0  2020-02-01   B
2        1  2020-04-01   A
4        2  2021-01-01   B

edited Feb 02 '23 at 17:29

answered Feb 02 '23 at 17:13

Corralien

109,409
8
28
52

Sorting is `O(n*logn)`, use `df.loc[pd.to_datetime(df['date']).groupby(df['item_id'], sort=False).idxmax()]` ;) – mozway Feb 02 '23 at 17:18
`idxmax` doesn't work on string, `sort_values` works on both :-P I'm kidding :-) – Corralien Feb 02 '23 at 17:30
Yes it does! `df.loc[df['date'].astype('string').groupby(df['item_id'], sort=False).idxmax()]` or `df.loc[df.convert_dtypes().groupby('item_id', sort=False)['date'].idxmax()]` :p – mozway Feb 02 '23 at 17:54

How to get the last occurrance of all items on a column (pandas)

1 Answers1