Code references by Compare two DataFrames and output their differences side-by-side
df1
----
# id score isEnrolled Comment
# 0 111 2.17 True He was late to class
# 1 112 1.11 False Graduated
# 2 113 4.12 True NaN
df2
----
# 0 111 2.17 True He was late to class
# 1 112 1.21 False Graduated
# 2 113 4.12 False On vacation
df.set_index('id', inplace=True)
def report_diff(x):
return x[0] if x[0] == x[1] else '{} | {}'.format(*x)
changes = df.groupby(level=0, axis=1).apply(lambda x: x.apply(report_diff, axis=1))
print(changes)
My expected output are rows that have different values (not including id 111)
score isEnrolled Comment
id
112 1.11 | 1.21 False Graduated
113 4.12 True | False nan | On vacation