0

I have two dataframes:

enter image description here

I am trying to compare these dataframes and find the delta out (what's not in both of them). I have tried compare but that needs same indexes and columns, which won't here. How do I go about this? I am a newbie to pandas and trying to learn.

Utkarsh Singh
  • 283
  • 2
  • 12
  • Is column b always `value {a}`? – Sam Mason Aug 28 '23 at 19:24
  • Just noticed that the space between "value" and the number changes, but this isn't reflected in the result. Is that deliberate? – Sam Mason Aug 28 '23 at 19:26
  • As a one-liner with `pd.merge`: `pd.merge(df, df1, how="outer", indicator="compare").query("compare != 'both'").drop("compare", axis=1)` Alternatively: `comp = pd.merge(df, df1, how="outer", indicator="compare"); comp.loc[comp["compare"].ne("both")]` – Rawson Aug 28 '23 at 19:28
  • column b is some value attached to a that does not change. A and B can be looked at as a key value pair. – Utkarsh Singh Aug 28 '23 at 19:42

1 Answers1

1

Solution:

The quickest way I have found to do this is to use .drop_duplicates(). I have found it to be very efficient on iterating over large datasets but for smaller datasets there's likely not much a speed difference against other methods. For example

df1=df1.drop_duplicates(keep="first") 
df2=df2.drop_duplicates(keep="first") 
pd.concat([df1,df2]).drop_duplicates(keep=False)

Based on the answer found here

Note: The "keep=False" parameter means to drop every duplicate pair from the set, which will leave only the difference, which you're asking for.

Jesse Sealand
  • 302
  • 1
  • 11