1

I have a dataframe which looks like this

  TaskID Status         Time         
0    123   Progress     12.00
1    234   Progress     12.10
2    123   Almost Done  12.20
3    234   Completed    12.40

I need to update the Status of the records with the latest records without changing the values of the other columns and delete the latest duplicate records.

Final Result

   TaskID Status         Time         
0    123   Almost Done  12.00
1    234   Completed    12.10

How can I achieve this? Thanks.

Rakesh
  • 81,458
  • 17
  • 76
  • 113

1 Answers1

1

I believe you need aggregate by agg with first and last:

df = df.groupby('TaskID', as_index=False).agg({'Status':'last','Time':'first'})
print (df)
   TaskID       Status  Time
0     123  Almost Done  12.0
1     234    Completed  12.1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252