-1

Had a dataset like :

mail id          score
xyz@yahoo.com     10
abc@gmail.com     13
xyz@yahoo.com     16
pqr@gmail.com     20
abc@gmail.com     19
mno@gmail.com     24

From the above data, have to remove duplicate values by comparing the score column.

Eg: In mail column we have 2 xyz@yahoo.com and abc@gmil.com. Here, we need remove duplicate values by comparing there score.

For xyz@yahoo.com had score 10 & 16 then it should return the greate value row.

output:

mail id          score
xyz@yahoo.com     16
pqr@gmail.com     20
abc@gmail.com     19
mno@gmail.com     24
manoj kumar
  • 105
  • 5

1 Answers1

1

Use sort_values() method and drop_duplicates() method:

resultdf=df.sort_values('score',ascending=False).drop_duplicates('mail id')

OR

You can also do this by groupby() method:

resultdf=df.groupby('mail id')['score'].nlargest(1).droplevel(1).reset_index()
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41