I have two dataframes, let's call them A and B. They have exactly the same 7 columns (let's call them col1, col2, col3, col4, col5, col6 and col7). Some of the columns include client_id, client_first_name, client_last_name, telephone number etc. (I can't reveal the exact names for confidentiality purposes).
DataFrame A is much bigger than DataFrame B and some of the entries from DataFrame B are included in DataFrame A (i.e. DataFrame B is a subset of DataFrame A).
The problem is, I want to make sure that the records in DataFrame A are NOT in DataFrame B, i.e. 'subtract' DataFrame B from DataFrame A. How do I do it?
So far, I've been adding an extra column entitled 'group' for both DataFrames, merging them using pd.merge(A, B, how='left', on='col)
and then pulling out the ones that ended up with two different values for 'group_x'
and 'group_y'
(the merge created these two groups.
Is there an easier way to do it? I tried a bunch of things but none of them worked.