0

Please help me to drop duplicate rows based on combination of two columns: my code:

df=pd.DataFrame(
{
'Name': ['Blaze','Tessi','Marshal', 'Tessi','Blaze','Tessi'],
'Age': [7,6,8,6,8,9],
'Class':['CP-II','CP1','2nd','CP1','CP2','CP3'], 
'Marks':[9.0,10.0,8.5,11,9.0,10.0]
}
)
df

Output:

    Name      Age   Class   Marks   
0   Blaze     7     CP-II   9
1   Tessi     6     CP1    10
2   Marshal   8     2nd     8.5
3   Tessi     6     CP1     11
4   Blaze     8     CP2     9
5   Tessi     9     CP3     10

on this output when am trying to drop duplicate using combination of Name and Age it is dropping row no. 4 as well which is unexpected. Can someone please explain?

df.drop_duplicates('Name' and 'Age')
OUTPUT:
    Name      Age   Class   Marks   
0   Blaze     7     CP-II   9
1   Tessi     6     CP1    10
2   Marshal   8     2nd     8.5
5   Tessi     9     CP3     10

I was expecting index 4 to remain in the output. I tried with "OR" method as well but it's not working as well. Can you please help? Thank you.

  • Hi, Maybe this Q / A can be useful to you [https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-across-multiple-columns-in-python-pandas](https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-across-multiple-columns-in-python-pandas) – Juan Rey Hernández May 14 '21 at 19:23
  • df.drop_duplicates(['Name','Age'],keep= 'last') – ZahraRezaei May 14 '21 at 19:35
  • 1
    `'Name' and 'Age'` evaluates to `"Name"` and so it drops the ones with the same name only. – Mustafa Aydın May 14 '21 at 19:38

0 Answers0