I have two dataframes, say df1
and df2
, with the same column names.
Example:
df1
C1 | C2 | C3 | C4
A 1 2 AA
B 1 3 A
A 3 2 B
df2
C1 | C2 | C3 | C4
A 1 3 E
B 1 2 C
Q 4 1 Z
I would like to filter out rows in df1
based on common values in a fixed subset of columns between df1
and df2
. In the above example, if the columns are C1
and C2
, I would like the first two rows to be filtered out, as their values in both df1
and df2
for these columns are identical.
What would be a clean way to do this in Pandas?
So far, based on this answer, I have been able to find the common rows.
common_df = pandas.merge(df1, df2, how='inner', on=['C1','C2'])
This gives me a new dataframe with only those rows that have common values in the specified columns, i.e., the intersection.
I have also seen this thread, but the answers all seem to assume a difference on all the columns.
The expected result for the above example (rows common on specified columns removed):
C1 | C2 | C3 | C4
A 3 2 B