1

Given two dataframes, df1 and df2, each with a unique ID, ID, how do I get the elements in df2 not in df1?

Right now, my solutions are:

  1. df2 - df1 = pd.concat([df2, df1, df1]).drop_duplicates(subset = 'ID', keep=False)

  2. df1 - df2 = pd.concat([df2, df2, df1]).drop_duplicates(subset = 'ID', keep=False)

However, my results are the opposite of what's expected, i.e., RHS(1.) = LHS(2.) and vice versa.

Referring to 1., my logic is that the records in df1 are removed as df1 is included twice. Records in df2 with matching IDs to those in df1 are also knocked out. Therefore, the records left are the records in df2 that don't share IDs with those in df1; said differently, the only records that remain are those found exclusively in df2.

Pointers would be greatly appreciated. Thanks!

Dylan Smith
  • 118
  • 1
  • 10
rml2018
  • 37
  • 1
  • 7

1 Answers1

0

This should solve your purpose

result= df2[~df2.id.isin(df1.id)]