49

I have two dataframes, A and B, and I want to get those in A but not in B, just like the one right below the top left corner.

The one below the top left

Dataframe A has columns ['a','b' + others] and B has columns ['a','b' + others]. There are no NaN values. I tried the following:

1.

dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) | (~dfA['b'].isin(dfm['b'])

2.

dfm = dfA.merge(dfB, on=['a','b'])
dfe = dfA[(~dfA['a'].isin(dfm['a']) & (~dfA['b'].isin(dfm['b'])

3.

dfe = dfA[(~dfA['a'].isin(dfB['a']) | (~dfA['b'].isin(dfB['b'])

4.

dfe = dfA[(~dfA['a'].isin(dfB['a']) & (~dfA['b'].isin(dfB['b'])

but when I get len(dfm) and len(dfe), they don't sum up to dfA (it's off by a few numbers). I've tried doing this on dummy cases and #1 works, so maybe my dataset may have some peculiarities I am unable to reproduce.

What's the right way to do this?

irene
  • 2,085
  • 1
  • 22
  • 36
  • The problem is in the `(~dfA['a'].isin(dfm['a']) | (~dfB['b'].isin(dfm['b']))` expression. For example, if in `dfm` you have `(a=1, b=1), (a=2, b=2)` and in dfA `(a=1, b=2)` then this expression return `False`. – phi May 26 '18 at 13:44
  • @user3063243 Got it. I've also posted other attempts I made, none of which are correct. Would you know the correct way to do this? – irene May 26 '18 at 13:46
  • nice picture... – liang Oct 03 '18 at 15:33
  • @liang picture not mine though...I got it from somewhere in Google Images – irene Oct 05 '18 at 06:15
  • 1
    Related: If you are interested in learning how to correctly merge on columns or index, you can refer to this canonical I've written recently: [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101). – cs95 Dec 08 '18 at 06:15
  • Was struggling with this until I realized that how='left' is NOT 'left-outer', only 'left'.Thanks for asking. – wbg Jan 05 '21 at 22:44

2 Answers2

72

Check out this link

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True)
df = df[df['_merge'] == 'left_only']

One liner :

df = pd.merge(dfA, dfB, on=['a','b'], how="outer", indicator=True
              ).query('_merge=="left_only"')
phi
  • 10,572
  • 3
  • 21
  • 30
5

I think it would go something like the examples in: Pandas left outer join multiple dataframes on multiple columns

dfe = pd.merge(dFA, dFB, how='left', on=['a','b'], indicator=True)
dfe[dfe['_merge'] == 'left_only']
Gaurang Tandon
  • 6,504
  • 11
  • 47
  • 84
Justin
  • 191
  • 3