Joining pandas dataframes of identical columns and getting ONLY the items in table B not present in A

Question

I have two dataframes, dfA and dfB of identical columns. I'm looking to get only the records from dataframe dfB that are not present in dataframe dfA.

to be clear, I do not want to get the records in dfA that are not in dfB.

I managed to hack something together that works using this, but the code is not easy to understand and by extension not very pythonic.

I'm looking for a more elegant solution, perhaps using pandas join/merge/append but wasn't able to make it work.

Example of what I want:

  dfA: 
  Date       Category  Price 
1 2013-11-24 Coat      22.1
2 2013-11-24 Shirt     8.7
3 2013-11-01 Socks     9     <<< Only present in this df

  dfB:
  Date       Category  Price 
1 2013-11-24 Coat      22.1
2 2013-11-24 Shirt     8.7
3 2013-11-24 Helmet    2.1   <<< Only present in this df
4 2013-11-24 Pants     10.7  <<< Only present in this df

  Result:
  Date       Category  Price 
1 2013-11-24 Helmet    2.1
2 2013-11-24 Pants     10.7

score 6 · Accepted Answer · answered May 08 '18 at 07:36

6

One of the idiomatic approaches is to use merge(..., how='outer', indicator=True) and to filter resulting DF by generated column _merge:

In [18]: (A.merge(B, how='outer', indicator=True)
           .query("_merge == 'right_only'")
           .drop('_merge',1))
Out[18]:
         Date Category  Price
3  2013-11-24   Helmet    2.1
4  2013-11-24    Pants   10.7

answered May 08 '18 at 07:36

MaxU - stand with Ukraine

205,989
36
386
419

Precisely what I was looking for. Learned a lot. Thank you. – Idodo May 08 '18 at 08:34
@Idodo, glad i could help :) – MaxU - stand with Ukraine May 08 '18 at 08:49

score 2 · Answer 2 · answered May 08 '18 at 07:38

Use merge with outer join and filter by indicator column with boolean indexing:

df = pd.merge(dfA, dfB, indicator=True, how='outer')
print (df)
         Date Category  Price      _merge
0  2013-11-24     Coat   22.1        both
1  2013-11-24    Shirt    8.7        both
2  2013-11-01    Socks    9.0   left_only
3  2013-11-24   Helmet    2.1  right_only
4  2013-11-24    Pants   10.7  right_only

df1 = df[df['_merge'] == 'right_only'].drop('_merge', axis=1)
print (df1)
         Date Category  Price
3  2013-11-24   Helmet    2.1
4  2013-11-24    Pants   10.7

score 1 · Answer 3 · answered May 08 '18 at 08:05

1

In case you have a unique column in your table you are filtering by. Perhaps by Category in your case.

You could use :

dfB[~dfB['Category'].isin(dfA['Category'])]

answered May 08 '18 at 08:05

Ajit

667
2
14
27

Not quite what I was looking for, but great nonetheless. – Idodo May 08 '18 at 08:33

Joining pandas dataframes of identical columns and getting ONLY the items in table B not present in A

3 Answers3