0

So I have a df that looks something like this:

Person_ID    Procedure_ID   Date(d/m/y)
34           30             03/03/2011
34           30             02/03/2011
32           19             01/01/2020
34           32             01/04/2012

If a Person_ID has the same procedure twice or more, like that 34 - 30 case above, the code needs to keep only the newest row, deleting all the others. In the example I gave the expected result would be:

Person_ID    Procedure_ID   Date(d/m/y)
34           30             02/03/2011
32           19             01/01/2020
34           32             01/04/2012

Thank you in advance!

Suetam016
  • 101
  • 7

1 Answers1

1

Groupby 'Person_ID', 'Procedure_ID' and get the last element from each group.

Code

df.sort_values(by='Date(d/m/y)').groupby(['Person_ID', 'Procedure_ID'], as_index=False).last()

Output

Person_ID   Procedure_ID    Date(d/m/y)
0   32      19              01/01/2020
1   34      30              02/03/2011
2   34      32              01/04/2012
Utsav
  • 5,572
  • 2
  • 29
  • 43