Given two dataframes, df1
and df2
, each with a unique ID, ID
, how do I get the elements in df2
not in df1
?
Right now, my solutions are:
df2 - df1 =
pd.concat([df2, df1, df1]).drop_duplicates(subset = 'ID', keep=False)
df1 - df2 =
pd.concat([df2, df2, df1]).drop_duplicates(subset = 'ID', keep=False)
However, my results are the opposite of what's expected, i.e., RHS(1.) = LHS(2.) and vice versa.
Referring to 1., my logic is that the records in df1
are removed as df1
is included twice. Records in df2
with matching ID
s to those in df1
are also knocked out. Therefore, the records left are the records in df2
that don't share ID
s with those in df1
; said differently, the only records that remain are those found exclusively in df2
.
Pointers would be greatly appreciated. Thanks!