-1

I have two dataframes, let's call them A and B. They have exactly the same 7 columns (let's call them col1, col2, col3, col4, col5, col6 and col7). Some of the columns include client_id, client_first_name, client_last_name, telephone number etc. (I can't reveal the exact names for confidentiality purposes).

DataFrame A is much bigger than DataFrame B and some of the entries from DataFrame B are included in DataFrame A (i.e. DataFrame B is a subset of DataFrame A).

The problem is, I want to make sure that the records in DataFrame A are NOT in DataFrame B, i.e. 'subtract' DataFrame B from DataFrame A. How do I do it?

So far, I've been adding an extra column entitled 'group' for both DataFrames, merging them using pd.merge(A, B, how='left', on='col) and then pulling out the ones that ended up with two different values for 'group_x' and 'group_y' (the merge created these two groups.

Is there an easier way to do it? I tried a bunch of things but none of them worked.

Kasia R
  • 1
  • 1

1 Answers1

0

Yes your way is OK, you could also do something like dfA.ix[!dfA.col.isin(dbB.col)] if you don't need the merged dataframe.

maxymoo
  • 35,286
  • 11
  • 92
  • 119