How to remove duplicate values from the column with pandas

Question

Had a dataset like :

mail id          score
xyz@yahoo.com     10
abc@gmail.com     13
xyz@yahoo.com     16
pqr@gmail.com     20
abc@gmail.com     19
mno@gmail.com     24

From the above data, have to remove duplicate values by comparing the score column.

Eg: In mail column we have 2 xyz@yahoo.com and abc@gmil.com. Here, we need remove duplicate values by comparing there score.

For xyz@yahoo.com had score 10 & 16 then it should return the greate value row.

output:

mail id          score
xyz@yahoo.com     16
pqr@gmail.com     20
abc@gmail.com     19
mno@gmail.com     24

Anurag Dabas · Answer 1 · 2021-04-20T12:26:36.283

1

Use sort_values() method and drop_duplicates() method:

resultdf=df.sort_values('score',ascending=False).drop_duplicates('mail id')

OR

You can also do this by groupby() method:

resultdf=df.groupby('mail id')['score'].nlargest(1).droplevel(1).reset_index()

edited Apr 20 '21 at 12:26

answered Apr 20 '21 at 12:20

Anurag Dabas

3

yop, like https://stackoverflow.com/a/40629420/2901002 – jezrael Apr 20 '21 at 12:21
1

only difference is drop_duplicates by 2 columns – jezrael Apr 20 '21 at 12:22
ohh...didn't noticed Thnx again @jezrael **:)** – Anurag Dabas Apr 20 '21 at 12:30
ya, best remove, but it is up to you – jezrael Apr 20 '21 at 12:30

1 Answers1