I am new to working with Python and Pandas and I have a question about grouping a dateframe I have.
I am grouping the dataframe by id but if there are two rows for one id, I only want to take the row that has the most recent value in the category_timestamp column.
This is what the results look like in the dataframe:
id date_cancelled owner_id reason category_timestamp
610040 2020-06-23 15:26:32 345198 No Longer Qualifies 2020-06-23 15:26:15
122672 2020-06-23 15:30:35 28950 Billing Cancellation 2020-06-23 15:30:35
122672 2020-06-23 15:30:35 28950 No Contact 2018-04-26 8:45:17
862708 2020-06-23 17:31:03 327378 Changed Mind/Persuaded 2020-06-23 17:30:50
436932 2020-06-25 1:07:02 28950 No Contact 2019-08-09 8:02:05
So what I would like to have happen is the id that is showing twice(122672), I only want to display the one with the most recent category_timestamp.
How do I add this to this line of code?
merged_df.groupby(['contact_id'])
Thanks!