1

I have 2 pandas dataframes which have exactly same columns. So they look something like this:

Dataframe1:
C1 C2 C3
1  A  X
2  B  Y

Dataframe2:
C1 C2 C3
1  A  X
3  C  Z

I want to find difference between these 2 dataframes. Basically i need following 3 output:

  1. No of same rows in 2 dataframes - "1" in this case
  2. Rows present in dataframe1, but missing in dataframe2

    2  B  Y
    
  3. Rows present in dataframe2, but missing in dataframe1

    3  C  Z
    

I found no of same rows as:

same_line_count = (pd.merge(df1, df2, on=['C1', 'C2', 'C3'], how='inner')).shape[0]

But I am unable to find other 2 nos.

Shweta
  • 1,111
  • 3
  • 15
  • 30

1 Answers1

2

I think you need merge with outer join and parameter indicator, for filtering use loc with boolean indexing and for count same rows sum boolean mask:

print (Dataframe1)
   C1 C2 C3
0   1  A  X
1   2  B  Y
2   2  C  Y

print (Dataframe2)
   C1 C2 C3
0   1  A  X
1   3  C  Z

df = pd.merge(Dataframe1, Dataframe2, indicator=True, how='outer')
print (df)
   C1 C2 C3      _merge
0   1  A  X        both
1   2  B  Y   left_only
2   2  C  Y   left_only
3   3  C  Z  right_only

both = (df['_merge'] == 'both').sum()
print (both)
1

left_only = df.loc[df['_merge'] == 'left_only', Dataframe1.columns]
print (left_only)
   C1 C2 C3
1   2  B  Y
2   2  C  Y

right_only = df.loc[df['_merge'] == 'right_only', Dataframe1.columns]
print (right_only)
   C1 C2 C3
3   3  C  Z
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252