Find rows differences between two dataframes

Question

I have two dataframes that have the same structure/indexes.

df1 = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'column_a': [5, 4, 3, 2, 1],
    'column_b': [5, 4, 3, 2, 1],
    'column_c': [5, 4, 3, 2, 1]
})
df1.set_index('id', drop=False, inplace=True)

and

df2 = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'column_a': [5, 4, 3, 2, 1],
    'column_b': [5, 4, 3, 2, 1],
    'column_c': [5, 4, 10, 2, 1]
})
df2.set_index('id', drop=False, inplace=True)

And I would like to get this result:

expected = pd.DataFrame({'id': [3], 'column_a': [3], 'column_b': [3], 'column_c': [10]})

I tried using for-loop, but I need to deal with a large data load, and it didn't become so performant...

score 3 · Accepted Answer · answered Mar 08 '22 at 18:55

Try with merge, filtering on the indicator:

>>> (df2.reset_index(drop=True).merge(df1.reset_index(drop=True), 
                                      indicator="Exist",
                                      how="left")
                               .query("Exist=='left_only'")
                               .drop("Exist", axis=1)
     )

   id  column_a  column_b  column_c
2   3         3         3        10

score 0 · Answer 2 · answered Mar 08 '22 at 18:59

What you're asking for could be possibly be answered here.

Using the drop_duplicates example from that thread,

pd.concat([df1,df2]).drop_duplicates(keep=False)

you can end up with the following DataFrame.

    id  column_a  column_b  column_c
id                                  
3    3         3         3         3
3    3         3         3        10

Albeit this approach will retrieve rows from both DataFrames.

Find rows differences between two dataframes

2 Answers2