1

Thanks to stack overflow comments I created a little function that compares two dataframes using pandas.

 # sample data frames
 a1 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : '71', 'datum': np.nan, 'width':'wide'},])

 a2 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : 'A', 'datum': np.nan, 'width':'wide'},])

 a3 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : 'A', 'datum': np.nan, 'width':'wider'},])


# compare function
def dfCompare(a,b):
    if a.equals(b) == True:      
        print "no differences detected"
    else:
        df = pd.concat([a,b])
        if not df.drop_duplicates(keep=False).empty:
            return df.drop_duplicates(keep=False)
        else: 
            print "no differences detected"

dfCompare(a1,a2)

How to modify the result such, that a) either only the cells are shown that differ, e.g.

  dfCompare(a1,a2)

enter image description here

b) or to "mark" (e.g. highlight, or format values bold,...) the cells with differing values

dfCompare(a1,a3)

enter image description here

Thanks for any help and thoughts!

smci
  • 32,567
  • 20
  • 113
  • 146
user2006697
  • 1,107
  • 2
  • 11
  • 25

1 Answers1

2

This is relatively straightforward, but you have inadvertently (or perhaps advertently) included a comparison that makes this a little tricky - and that is the comparison of NaNs where you want NaN == NaN to be evaluated as True. But as this question and subsequent answers show, NaN == NaN evaluates as False.

So, knowing that, and not providing a highlighted indication (because my terminal only prints in black and white, and you don't specify what you're using to view color formatting), here's the best I can provide (simply adding an " - X" to those that do not match):

a1[(a1 != a3) & ((a1 == a1) & (a3 == a3))] += ' - X'
Community
  • 1
  • 1
elPastor
  • 8,435
  • 11
  • 53
  • 81
  • Ah, thanks for the explanation. As a beginner I did not know this about NaN == NaN. Thanks for your code, but it gives me the error "Could not compare [' - X'] with block values". Concerning highlighting: the idea was, if always the complete row is shown, somehow visibly enhance the non-matching cells. E.g. with colouring the cell, or whatever is easiest. – user2006697 May 15 '17 at 07:13