Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

Question

I am new to working with Python and Pandas and I have a question about grouping a dateframe I have.

I am grouping the dataframe by id but if there are two rows for one id, I only want to take the row that has the most recent value in the category_timestamp column.

This is what the results look like in the dataframe:

id          date_cancelled       owner_id   reason                  category_timestamp
610040      2020-06-23 15:26:32  345198     No Longer Qualifies     2020-06-23 15:26:15       
122672      2020-06-23 15:30:35  28950      Billing Cancellation    2020-06-23 15:30:35
122672      2020-06-23 15:30:35  28950      No Contact              2018-04-26 8:45:17
862708      2020-06-23 17:31:03  327378     Changed Mind/Persuaded  2020-06-23 17:30:50
436932      2020-06-25 1:07:02   28950      No Contact              2019-08-09 8:02:05

So what I would like to have happen is the id that is showing twice(122672), I only want to display the one with the most recent category_timestamp.

How do I add this to this line of code?

merged_df.groupby(['contact_id'])

Thanks!

You can do it by sorting & droping duplicates, ```df.sort_values(by="category_timestamp", ascending=False).drop_duplicates(subset="id", keep="first")``` — sushanth, Jun 25 '20 at 14:23
the second code block in [this answer](https://stackoverflow.com/a/15705958/13386979) also works — Tom, Jun 25 '20 at 14:41

score 0 · Accepted Answer · answered Jun 25 '20 at 14:23

0

I think it would be easier to just sort them by date and then drop the duplicates.

df = df.sort_values('date_cancelled', ascending=False)
df = df.drop_duplicates(subset='owner_id', keep='first')
print(df)

answered Jun 25 '20 at 14:23

Finn

2,333
1
10
21

Group By Customer Id and Also Take Date Column With Most Recent Value In Pandas

1 Answers1