Note: This question is inspired by the ideas discussed in this other post: DataFrame algebra in Pandas
Say I have two dataframes A
and B
and that for some column col_name
, their values are:
A[col_name] | B[col_name]
--------------| ------------
1 | 3
2 | 4
3 | 5
4 | 6
I want to compute the set difference between A
and B
based on col_name
. The result of this operation should be:
The rows of A
where A[col_name]
didn't match any entries in B[col_name]
.
Below is the result for the above example (showing other columns of A
as well):
A[col_name] | A[other_column_1] | A[other_column_2]
------------+-------------------|------------------
1 | 'foo' | 'xyz' ....
2 | 'bar' | 'abc'
Keep in mind that some entries in A[col_name]
and B[col_name]
could hold the value np.NaN
. I would like to treat those entries as undefined BUT different, i.e. the set difference should return them.
How can I do this in Pandas? (generalizing to a difference on multiple columns would be great as well)