0

I have dataframe

df = {'id':['A','A','A','A','B','B'], 'type':['PO','PO','PO','MB','MB','FX'], 'desc':['01234','66666','77890','qwer','asdf','123abc'], 'maint_date':["2018-10-03","2018-10-03","2005-08-23", "2017-09-19","2008-08-13","2016-08-20"]}
id type desc maint_date
A PO 01234 2018-10-03
A PO 66666 2018-10-03
A PO 77890 2005-08-23
A MB QWER 2017-09-19
B MB ASDF 2008-08-13
B FX 123ABC 2016-08-20

I would like to sort based on latest 'id', 'type' and keeping the latest desc based on latest'maint_date'. I would like to keep multiple rows with the latest maintenance date even though having same 'id','type','maint_date'

To simplify, I would like to obtain result shown below:

id type desc maint_date
A PO 01234 2018-10-03
A PO 66666 2018-10-03
A MB QWER 2017-09-19
B FX 123ABC 2016-08-20
Emma She
  • 3
  • 3
  • @mkrieger1 df.sort_values(['id','type','maint_date'],ascending = [True,False,False]).drop_duplicates(['id','type','maint_date'],keep='first'), I tried to run this line of code, but it did not remain the second row with desc '6666' eventhough the maintenance date is the latest – Emma She Mar 14 '22 at 11:19
  • Check [this](https://stackoverflow.com/a/31185210/2901002) solution – jezrael Mar 14 '22 at 11:28
  • thanks @jezrael, u led me straight to the right source! It finally solved my problem – Emma She Mar 14 '22 at 11:34

0 Answers0