Compare 2 dataframes, return only differing cells, treating NaNs as equal

Question

Thanks to stack overflow comments I created a little function that compares two dataframes using pandas.

 # sample data frames
 a1 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : '71', 'datum': np.nan, 'width':'wide'},])

 a2 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : 'A', 'datum': np.nan, 'width':'wide'},])

 a3 = pd.DataFrame([{'_id' : '71', 'datum': '2009-11-30', 'width':'wide'},
  {'_id' : 'A', 'datum': np.nan, 'width':'wider'},])


# compare function
def dfCompare(a,b):
    if a.equals(b) == True:      
        print "no differences detected"
    else:
        df = pd.concat([a,b])
        if not df.drop_duplicates(keep=False).empty:
            return df.drop_duplicates(keep=False)
        else: 
            print "no differences detected"

dfCompare(a1,a2)

How to modify the result such, that a) either only the cells are shown that differ, e.g.

  dfCompare(a1,a2)

b) or to "mark" (e.g. highlight, or format values bold,...) the cells with differing values

dfCompare(a1,a3)

Thanks for any help and thoughts!

score 2 · Answer 1 · edited May 23 '17 at 12:03

2

This is relatively straightforward, but you have inadvertently (or perhaps advertently) included a comparison that makes this a little tricky - and that is the comparison of NaNs where you want NaN == NaN to be evaluated as True. But as this question and subsequent answers show, NaN == NaN evaluates as False.

So, knowing that, and not providing a highlighted indication (because my terminal only prints in black and white, and you don't specify what you're using to view color formatting), here's the best I can provide (simply adding an " - X" to those that do not match):

a1[(a1 != a3) & ((a1 == a1) & (a3 == a3))] += ' - X'

edited May 23 '17 at 12:03

Community

1
1

answered May 13 '17 at 04:22

elPastor

8,435
11
53
81

Ah, thanks for the explanation. As a beginner I did not know this about NaN == NaN. Thanks for your code, but it gives me the error "Could not compare [' - X'] with block values". Concerning highlighting: the idea was, if always the complete row is shown, somehow visibly enhance the non-matching cells. E.g. with colouring the cell, or whatever is easiest. – user2006697 May 15 '17 at 07:13

Compare 2 dataframes, return only differing cells, treating NaNs as equal

1 Answers1