0

I have the following dataframe-

my_df-

user_id |  spend |  transaction_id |
--------+--------+-----------------|
1       |   45   |        12       |
2       |   33   |        45       |
3       |   12   |        33       |
1       |   22   |        56       |
1       |   77   |        99       |
2       |   44   |        68       |

My goal is to get all rows with the greatest transaction_id for each user_id.

So, I want my final result to look like this -

user_id |  spend |  transaction_id |
--------+--------+-----------------|
1       |   77   |        99       |
2       |   44   |        68       |
3       |   12   |        33       |

How do I do this?

kev
  • 2,741
  • 5
  • 22
  • 48
  • 1
    `df.sort_values(['user_id','transaction_id']).drop_duplicates('user_id', keep='last')` – Quang Hoang Mar 04 '21 at 19:25
  • What have you tried so far based on your own research? For example, groupby() with `.max()` might work for you – G. Anderson Mar 04 '21 at 19:25
  • @G.Anderson groupby() will cause it do drop the `spend` column but I want to retain it without any aggregation on it. @Quang Hoang's answer works perfect – kev Mar 04 '21 at 19:46

0 Answers0