How to use isin (without merge) for multiple dataframe columns

Question

I have two dataframes:

df1

   Company  Symbol     ID         Date    Value
0  AAA Inc       A    123   2019-12-31       10
1  AAA Inc       A    123   2020-03-30       11
2  BBB Inc       B    456   2019-03-31       12
3  FFF Inc       F    653   2019-06-31       22
4  ZZZ Inc       Z    999   2019-03-30       13

df2

   Company  Symbol     ID         Date   Price
0  AAA Inc       A    123   2019-12-31      10
1  AAA Inc       A    123   2019-09-30      20
2  AAA Inc       A    123   2019-06-31      30
3  AAA Inc       A    123   2019-03-30      40
4  BBB Inc       B    456   2019-12-31      50
5  BBB Inc       B    456   2019-09-30      50
6  BBB Inc       B    456   2019-06-31      90
7  BBB Inc       B    456   2019-03-31      10
8  CCC Inc       C    789   2019-12-31      79
9  CCC Inc       C    789   2019-09-31      43

And I want to create a 3rd dataframe which is all the ['Symbol', 'ID', 'Date'] rows that are in df1, but not in df2. So the output would look like this:

   Company  Symbol     ID         Date
0  AAA Inc       A    123   2020-03-30
1  FFF Inc       F    653   2019-06-31
2  ZZZ Inc       Z    999   2019-03-30

I know I can do something like df3 = df1[~df1['Symbol'].isin(df2['Symbol'])], but after doing some research it doesn't seem like there's a good way to use isin with multiple columns.

I have also checked similar questions but haven't found anything that only addresses specific columns.

I would also like to avoid merging the two dataframes if possible.

So how would I achieve this?

score 2 · Answer 1 · answered May 03 '20 at 10:48

2

Because you want avoid merge here is one posible solution is compare MultiIndex by Index.isin:

cols = ['Symbol', 'ID', 'Date']
df3 = df1[~df1.set_index(cols).index.isin(df2.set_index(cols).index)]
print (df3)
  Company Symbol   ID        Date  Value
1     AAA  Inc A  123  2020-03-30     11
3     FFF  Inc F  653  2019-06-31     22
4     ZZZ  Inc Z  999  2019-03-30     13

Or convert columns to tuples:

cols = ['Symbol', 'ID', 'Date']
df3 = df1[~df1[cols].apply(tuple, 1).isin(df2[cols].apply(tuple, 1))]

For compare solution with merge and indicator parameter:

cols = ['Symbol', 'ID', 'Date']
df3 = (df1.merge(df2, on=cols, indicator=True, how='left', suffixes=('','_'))
          .query("_merge == 'left_only'")[df1.columns])
print (df3)
  Company Symbol   ID        Date  Value
1     AAA  Inc A  123  2020-03-30     11
3     FFF  Inc F  653  2019-06-31     22
4     ZZZ  Inc Z  999  2019-03-30     13

answered May 03 '20 at 10:48

jezrael

822,522
95
1,334
1,252

Am I making this more difficult for myself by not using merge? – user53526356 May 03 '20 at 11:17
@MSD - I think it is simplier ;) But if lage both DataFrames not sure if more performant, the best test. – jezrael May 03 '20 at 11:19
It's not a very large dataframe so performance isn't a huge concern. So maybe I'll just look at merge then. – user53526356 May 03 '20 at 11:20
this is a bit longer, and performant????? not sure : ```ind = ['Symbol','ID','Date']; df1.loc[(~df1.filter(ind).isin(df2.to_dict('list'))).any(axis=1)]``` – sammywemmy May 03 '20 at 11:25

How to use isin (without merge) for multiple dataframe columns

1 Answers1