0

I have two dataframes that contains names. What I am need to do is to check which of the names in second dataframe are not present in the first dataframe. For this example

list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])

and

list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])

So basically I in this simple example we can see that David and Matt from second dataframe do not exist in the first dataframe.

I need programmatically to come up with 3rd dataframe that will have results like this:

Names
David
Matt

My first thought was to try using pandas merge function but I am unable to get the unique set of names from df2 that are not in df1.

Any thoughts on how to do this?

Corralien
  • 109,409
  • 8
  • 28
  • 52
Slavisha84
  • 747
  • 3
  • 7
  • 22

3 Answers3

1

You can create the 3rd dataframe filtering the 2nd with a condition like this..

df3 = df2[~df2['Names'].isin(df1['Names'])]
Pedro Rocha
  • 1,373
  • 1
  • 3
  • 14
  • hi, i tried that and it returns empty dataframe. – Slavisha84 Feb 20 '23 at 20:37
  • I've tested it and works ok for me. What's the output of doing this: `[name for name in df2["Names"] if name not in df1["Names"].values]` ? – Pedro Rocha Feb 20 '23 at 20:40
  • In both cases i just get output with column "Names" and there is no value under it. – Slavisha84 Feb 20 '23 at 20:43
  • @Slavisha84 did you remove the `df2 = df1` from the second part of code? The solution works otherwise. – Uchiha012 Feb 20 '23 at 20:43
  • @PedroRocha - I did exactly what you pasted brother and for some reason in both cases i am getting empty dataframe. – Slavisha84 Feb 20 '23 at 20:46
  • There are two possibilities, 1. you have a previous error in your code like the `df2 = df1 = pd.Dataframe(list2, columns =['Names'])`when you are creating your df2. or 2. you have in your original data some names with spaces for example... then you can try this: `df3 = df2[~df2['Names'].apply(lambda x: x.lower().strip()).isin(df1['Names'].apply(lambda x: x.lower().strip()))]` – Pedro Rocha Feb 20 '23 at 20:50
  • You were right. In my post i had an error with df2 = df1 =... When i fixed that on my computer the solution you provided df3 = df2[~df2['Names'].isin(df1['Names'])] worked like charm. Thank you So much – Slavisha84 Feb 20 '23 at 20:54
1

You can also use merge with indicator:

>>> df1.merge(df2, on='Names', how='outer', indicator='exist')
   Names       exist
0   Mark        both
1   Sofi        both
2    Joh   left_only
3    Leo   left_only
4  Jason        both
5  David  right_only
6   Matt  right_only

>>> (df1.merge(df2, on='Names', how='outer', indicator='exist')
        .loc[lambda x: x.pop('exist') == 'right_only'])
   Names
5  David
6   Matt

Input dataframes:

list1 = ['Mark','Sofi','Joh','Leo','Jason']
df1 = pd.DataFrame(list1, columns =['Names'])

list2 = ['Mark','Sofi','David','Matt','Jason']
df2 = pd.DataFrame(list2, columns =['Names'])
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

Here is another approach,

key_diff = set(df2.Names).difference(df1.Names)
where_diff = df2.Names.isin(key_diff)
df3 = df2[where_diff]

Refer this link for more

Uchiha012
  • 821
  • 5
  • 9