How do I delete duplicates based on a condition? Python/Pandas

Question

So I have a df that looks something like this:

Person_ID    Procedure_ID   Date(d/m/y)
34           30             03/03/2011
34           30             02/03/2011
32           19             01/01/2020
34           32             01/04/2012

If a Person_ID has the same procedure twice or more, like that 34 - 30 case above, the code needs to keep only the newest row, deleting all the others. In the example I gave the expected result would be:

Person_ID    Procedure_ID   Date(d/m/y)
34           30             02/03/2011
32           19             01/01/2020
34           32             01/04/2012

Thank you in advance!

`df.sort_values(by='Date(d/m/y)',ascending=False).drop_duplicates(subset=['Person_ID','Procedure_ID'])` — rhug123, May 06 '21 at 12:09
Never thought of using ascend to drop duplicates, thank you it worked! — Suetam016, May 06 '21 at 14:06

Utsav · Accepted Answer · 2021-05-06T15:09:14.953

1

Groupby 'Person_ID', 'Procedure_ID' and get the last element from each group.

Code

df.sort_values(by='Date(d/m/y)').groupby(['Person_ID', 'Procedure_ID'], as_index=False).last()

Output

Person_ID   Procedure_ID    Date(d/m/y)
0   32      19              01/01/2020
1   34      30              02/03/2011
2   34      32              01/04/2012

edited May 06 '21 at 15:09

answered May 06 '21 at 12:15

Utsav

5,572
2
29
43

Thank you for the answer, I didn't know about this last() command. It kinda worked, it does not return the newest date consistently doe. – Suetam016 May 06 '21 at 14:06
you would have to sort first for that – rhug123 May 06 '21 at 14:16
updated the code. – Utsav May 06 '21 at 23:40

How do I delete duplicates based on a condition? Python/Pandas

1 Answers1