Finding non-matching rows between two dataframes

Question

I have a scenario where I want to find non-matching rows between two dataframes. Both dataframes will have around 30 columns and an id column that uniquely identify each record/row. So, I want to check if a row in df1 is different from the one in df2. The df1 is an updated dataframe and df2 is the previous version.

I have tried an approach pd.concat([df1, df2]).drop_duplicates(keep=False) , but it just combines both dataframes. Is there a way to do it. I would really appreciate the help.

The sample data looks like this for both dfs.

id user_id type status

There will be total 39 columns which may have NULL values in them.

Thanks.

P.S. df2 will always be a subset of df1.

So not working first solution from [this](https://stackoverflow.com/q/48647534), do you try another one too? Also what is reason not working? Is possible add sample data, [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) for show your problem? — jezrael, Sep 28 '20 at 08:00
Check this: https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe — IoaTzimas, Sep 28 '20 at 08:00
@jezrael I have the same problem if you follow the link posted by archer. I have even tried that approach too, but it returns me all of the rows from df1. — Uzair, Sep 28 '20 at 08:25
ya, so please add some sample data for see your problem, only [please don't post images of code/data (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) — jezrael, Sep 28 '20 at 08:26
and, by the way, there can be columns with null values in them. — Uzair, Sep 28 '20 at 08:27
No, for sample data need 2-3 columns, here is most important explain why solution `pd.concat([df1, df2]).drop_duplicates(keep=False)` not working. — jezrael, Sep 28 '20 at 08:32

jitvimol · Answer 1 · 2020-09-28T09:15:18.940

1

If your df1 and df2 has the same shape, you may easily compare with this code.

df3 = pd.DataFrame(np.where(df1==df2,True,False), columns=df1.columns)

And you will see boolean output "False" for not matching cell value.

edited Sep 28 '20 at 09:15

answered Sep 28 '20 at 08:37

jitvimol

72
7

what do you mean by same len size? – Uzair Sep 28 '20 at 09:05
sorry, I mean same shape (number of row & column) – jitvimol Sep 28 '20 at 09:14
1

Upvoted for good answer, but comes with the caveat that checking columns with `NaN`s will still return False regardless. `np.nan == np.nan` => `False` – kevin_theinfinityfund Sep 13 '22 at 23:43

Finding non-matching rows between two dataframes

1 Answers1