Output rows that contains different value from 2 dataframes

Question

Code references by Compare two DataFrames and output their differences side-by-side

df1
----
#     id   score isEnrolled               Comment
# 0  111   2.17       True  He was late to class
# 1  112   1.11      False             Graduated
# 2  113   4.12       True                   NaN

df2
----
# 0  111   2.17       True  He was late to class
# 1  112   1.21      False             Graduated
# 2  113   4.12      False           On vacation

df.set_index('id', inplace=True)
def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)

changes = df.groupby(level=0, axis=1).apply(lambda x: x.apply(report_diff, axis=1))
print(changes)

My expected output are rows that have different values (not including id 111)

      score       isEnrolled            Comment
id                                                  
112  1.11 | 1.21     False             Graduated
113     4.12        True | False     nan | On vacation

score 1 · Answer 1 · answered Sep 01 '20 at 03:33

Combine each data frame and group the combined ones by 'id' to list the duplicates. Next, we run a duplicate check function (that you created) for each column. Finally, we simply remove 'id=111'.

df3 = pd.concat([df,df2],axis=0)
df3 = df3.groupby('id').agg(list).reset_index()
def report_diff(x):
    return x[0] if x[0] == x[1] else '{} | {}'.format(*x)
df3['score'] = df3['score'].apply(lambda x: report_diff(x))
df3['isEnrolled'] = df3['isEnrolled'].apply(lambda x: report_diff(x))
df3['Comment'] = df3['Comment'].apply(lambda x: report_diff(x))
df3 = df3[df3.index != 111]
df3
    score   isEnrolled  Comment
id          
112 1.11 | 1.21 False           Graduated
113 4.12        True | False    nan | On vacation

Output rows that contains different value from 2 dataframes

1 Answers1