0

I have two dataframe below:

df1>>

enter image description here

df2>>

enter image description here

I need to compare the two dataframe and find the id's in col1 where col2 value is a mismatch i.e. for the sample dataset 3 should be the answer.

Please note in the actual problem I have 10^6 number of unique id's in col 1 hence if-else ladder or running loop at each unique col1 key level is taking a very long execution time. I am looking for an efficient method to do this. I really appreciate any help you can provide.

Joy Ghosh
  • 11
  • 2
  • 1
    Hi there! It's greatly appreciated if you don't post data as image, but as proper text, and especially for dataframes, there are some [well established conventions](https://stackoverflow.com/a/20159305/15873043). – fsimonjetz Jul 29 '22 at 17:47

2 Answers2

0
df1 = pd.DataFrame({"col1": [1, 1,2,2,3,3] ,"col2": ['A','B','C','D','E','F']})
df2 = pd.DataFrame({"col1": [1, 1,2,2,3,3] ,"col2": ['A','B','D','C','E','K']})
df3 = df1.merge(df2, indicator=True, how='outer').query('_merge != "both"').drop('_merge', 1)
df3

enter image description here

Jui Sen
  • 377
  • 3
  • 12
0

You can compare dataframes with pd.DataFrame.compare.

>>> df1 = pd.DataFrame({"col1": [1, 1, 2, 2, 3, 3], "col2": list("ABCDEF")})
>>> df2 = pd.DataFrame({"col1": [1, 1, 2, 2, 3, 3], "col2": list("ABDCEK")})
>>> df1.compare(df2)
  col2      
  self other
2    C     D
3    D     C
5    F     K
fsimonjetz
  • 5,644
  • 3
  • 5
  • 21