3

I have 2 sets of data:

df1  
  'Y'  
A a  
B b  
C c  
D d  
E e  

df2  
  'Y'  
D d  
E e  
F f  

I need to transform the first df1 to remove any data appearing in df2 I can easily do it using SQL, but that's not an option for me The logic is 'select * from df1 where df1.Y not in (select Y from df2)' I can't seem to get the implementation right using merge. Any ideas? Desired result:

df3  
  'Y'  
A a  
B b  
C c  

with df2 unchanged

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • related: http://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe and dupe: http://stackoverflow.com/questions/32651860/python-pandas-how-to-find-rows-in-one-dataframe-but-not-in-another – EdChum Dec 24 '16 at 22:09

2 Answers2

4

You can subset df1 with the .isin() method:

df1[~df1.Y.isin(df2.Y)]

#   Y
#A  a
#B  b
#C  c
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

I do like @Psidom's Pandas solution.

Here is a NumPy alternative:

In [26]: d1[~np.in1d(d1.Y, d2.Y)]
Out[26]:
   Y
A  a
B  b
C  c

and yet another Pandas solution, which uses SQL-like query() method:

In [30]: d1.query('Y not in @d2.Y')
Out[30]:
   Y
A  a
B  b
C  c
Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419