Thanks to stack overflow comments I created a little function that compares two dataframes using pandas.
# sample data frames
a1 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
{'_id' : '71', 'datum': np.nan, 'width':'wide'},])
a2 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
{'_id' : 'A', 'datum': np.nan, 'width':'wide'},])
a3 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
{'_id' : 'A', 'datum': np.nan, 'width':'wider'},])
# compare function
def dfCompare(a,b):
if a.equals(b) == True:
print "no differences detected"
else:
df = pd.concat([a,b])
if not df.drop_duplicates(keep=False).empty:
return df.drop_duplicates(keep=False)
else:
print "no differences detected"
dfCompare(a1,a2)
How to modify the result such, that a) either only the cells are shown that differ, e.g.
dfCompare(a1,a2)
b) or to "mark" (e.g. highlight, or format values bold,...) the cells with differing values
dfCompare(a1,a3)
Thanks for any help and thoughts!