Finding the difference between two dataframes in Python

Question

Suppose I have two dataframes

A:

column1 column2 
  abc      2
  def      2

B:

column1 column2 
  abc      2
  def      1

I want to compare these two dataframes and find where there are differences and get the value of column1.

So the output should be 'def' in this case

Grayrigel · Accepted Answer · 2020-11-05T11:15:57.160

Based on this answer here, you can try pd.concat method:

pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique().tolist()

Output:

# if you just want to see the differences between the dataframe
>>> pd.concat([A,B]).drop_duplicates(keep=False)
  column1  column2
1     def        2
1     def        1

# if you just want to see the differences and with only 'column1'
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1']
1    def
1    def
Name: column1, dtype: object

# if you want unique values in the column1 as a numpy array after taking the differences
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique()
array(['def'], dtype=object)

# if you want unique values in the column1 as a list after taking the differences
>>> pd.concat([A,B]).drop_duplicates(keep=False)['column1'].unique().tolist() 
['def']

Added an answer. Let me know if it works for you. It will give a list as an output. If it does please accept/check-mark the answer. — Grayrigel, Nov 05 '20 at 11:00

score 0 · Answer 2 · answered Nov 05 '20 at 11:01

0

pd.concat([A,B]).drop_duplicates(keep=False)

answered Nov 05 '20 at 11:01

MUK

371
4
13

Finding the difference between two dataframes in Python

2 Answers2