I have 2 large dataframes as follows:
df1
C1 C2 C3 V1
a e k 1
b f j 2
d g h 3
df2
C1 C2 C3 V1
a e m 21
d g p 1
e f q 2
f g r 3
I want to get the following:
C1 C2 C3 V1
a e k 1
b f j 2
d g h 3
e f q 2
f g r 3
Where I have only included rows from df2
where C1 and C2 are not present in df1
. i.e. I dropped first row of df2
as a, e
was already present in df1
.
I am trying:
df1['id'] = df1[['C1', 'C2']].astype(str).apply('_'.join, axis=1)
df2['id'] = df2[['C1', 'C2']].astype(str).apply('_'.join, axis=1)
df3 = df2[~df2['id'].isin(df1['id'])]
pd.concat([df1, df3])
Is there a better way to do this using some inbuilt function as I run into this kind of situation a lot with varying number of columns.