Remove any duplicate value in a specific row by comparing specific column in two dataframes

Question

I have two Pandas dataframes (Python3). It will look like below.

df1
name, score
Tom, 130
Jane, 98
Anny, 81
Chuck, 92

df2
name
Amy
Chuck
Dave
Danny
Emma
Jack
Tom
Taro

What I want to do is to look at df1 and if any name found in df2, remove name, score row from df1 altogether.

I searched around the best way to do this, but none of them worked for me. (Or, probably I don't use the function in the right way.) For example,

output= (df1!=df2)

This returns,

ValueError: Can only compare identically-labeled DataFrame objects

So, it does not take into account score column.

What I expect is to get,

name, score
Jane, 98
Anny, 81

Jane and Anny are not in the df2.

How can I do this?

Thanks, Andy. But, I got an error as, 'AttributeError: 'DataFrame' object has no attribute 'name' — K.K., May 22 '19 at 19:13
@Andy beat me to the answer. If you get an AttributeError, your column names are not what you described in the question. — maow, May 22 '19 at 19:16

score 0 · Answer 1 · answered May 22 '19 at 19:14

First to reproduce your example

import pandas as pd
from pandas import Series, DataFrame

df1 = pd.DataFrame({'name' : ['Tom', 'Jane', 'Anny', 'Chuck'], 'score' : [130, 98, 81, 92]})
df2 = pd.DataFrame({'name' : ['Amy', 'Chuck', 'Dave', 'Danny', 'Emma', 'Jack', 'Tom', 'Taro']})

You can select certain rows from df1 based on a condidtion with df1[condition]. In your case you want the df1.name not to be in df2.name. With .name you access the underlying array which can be coerced to set operations. You get the names in df2 with df1.name.isin(df2.name). To invert this expression you need to use the binary invert ~ (because it is boolean indexing).

In [23]: df1[~df1.name.isin(df2.name)]
Out[23]: 
   name  score
1  Jane     98
2  Anny     81

score 0 · Answer 2 · answered May 22 '19 at 19:19

Since you have error: 'AttributeError: 'DataFrame' object has no attribute 'name', your column names either has spaces or completely different.
Do this simple fix:

df1.columns = ['name', 'score']
df2.columns = ['name']

after that. It will work

df1[~df1.name.isin(df2.name)]

Note: I assume your df1 has 2 columns and df2 has 1 column as you describe.

Remove any duplicate value in a specific row by comparing specific column in two dataframes

2 Answers2