Compare two pandas dataframes for differences

Question

I have two dataframes and I want to compare them, then display the differences side by side. I had been using the accepted solution from this question, but am now getting an error with ne_stacked = (current_df != new_df).stack().

This used to work fine, but the error I'm getting now is The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. After looking at the documentation for all of these options I'm not sure how to implement any of them and keep the same functionality in my code.

How would I go about replacing ne_stacked = (current_df != new_df).stack() so I don't get the ambiguity error?

EDIT

Basic code example as requested:

d = {'a':[1,2,3],'b':[1,2,3],'c':[1,2,3]}
d2 = {'a':[4,2,3],'b':[1,4,3],'c':[1,2,4]}
df1 = pd.DataFrame(d)
df2 = pd.DataFrame(d2)
print (df1 != df2) //returns true when value in df1 is not equal to df2

       a      b      c
0   True  False  False
1  False   True  False
2  False  False   True

So the != expression works just fine for this simple dataframe, but not the more complex ones I'm using (below).

df1 = {'CORE': [{'satellite': '2B',
   'windowEnd': '2015-218 04:00:00',
   'windowStart': '2015-217 20:00:00'}],
 'DURATION': [500.0],
 'PRIORITY': [5],
 'RATE': [u'HIGH_RATE'],
 'STATUS': [u'ACTIVE'],
 'TASK_ID': [1],
 'TYPE': [u'NOMINAL'],
 'WINDOW_END': ['2015-218 04:00:00'],
 'WINDOW_START': ['2015-217 20:00:00']}

df2 = {'CORE': [{'satellite': '2B',
   'windowEnd': '2015-220 04:00:00',
   'windowStart': '2015-219 20:00:00'}],
 'DURATION': [500.0],
 'PRIORITY': [5],
 'RATE': [u'HIGH_RATE'],
 'STATUS': [u'ACTIVE'],
 'TASK_ID': [2],
 'TYPE': [u'NOMINAL'],
 'WINDOW_END': ['2015-220 04:00:00'],
 'WINDOW_START': ['2015-219 20:00:00']}

What return `(df1 != df2)`? Can you add [mcve](http://stackoverflow.com/help/mcve)? — jezrael, Dec 14 '15 at 19:09
Hmmm, I think this work with your sample very well. Maybe in your dataframes is problem. Is possible share them? — jezrael, Dec 14 '15 at 19:22
are you having problems with near floating point comparisons. .00001 almost equals .000009 etc.. ? — Back2Basics, Dec 14 '15 at 19:25
You might be looking for [`df1.equals(df2)`](http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.equals.html). — unutbu, Dec 14 '15 at 19:43
@jezrael I put my dataframes (in dict form) in the description. — kdubs, Dec 14 '15 at 20:12
@unutbu I need to know which indices are not equal, not just if the first dataframe is identical to the second. — kdubs, Dec 14 '15 at 20:13

Renan Vilas Novas · Accepted Answer · 2015-12-14T20:36:35.463

I'm using pandas version '0.16.2' and I couldn't see any error when I tried to evaluate df1 != df2.

Take a look at my code below:

import pandas as pd

d1 = {'CORE': [{'satellite': '2B',
  'windowEnd': '2015-218 04:00:00',
  'windowStart': '2015-217 20:00:00'}],
  'DURATION': [500.0],
  'PRIORITY': [5],
  'RATE': [u'HIGH_RATE'],
  'STATUS': [u'ACTIVE'],
  'TASK_ID': [1],
  'TYPE': [u'NOMINAL'],
  'WINDOW_END': ['2015-218 04:00:00'],
  'WINDOW_START': ['2015-217 20:00:00']}

d2 = {'CORE': [{'satellite': '2B',
  'windowEnd': '2015-220 04:00:00',
  'windowStart': '2015-219 20:00:00'}],
  'DURATION': [500.0],
  'PRIORITY': [5],
  'RATE': [u'HIGH_RATE'],
  'STATUS': [u'ACTIVE'],
  'TASK_ID': [2],
  'TYPE': [u'NOMINAL'],
  'WINDOW_END': ['2015-220 04:00:00'],
  'WINDOW_START': ['2015-219 20:00:00']}

df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
print (df1 != df2)

# It was printed:
#    CORE   DURATION  PRIORITY   RATE   STATUS  TASK_ID  TYPE    WINDOW_END WINDOW_START
# 0  True   False     False      False  False   True     False   True       True

You could also try to use .any():

print (df1 != df2).any(axis=0)
# It was printed:
# CORE             True
# DURATION        False
# PRIORITY        False
# RATE            False
# STATUS          False
# TASK_ID          True
# TYPE            False
# WINDOW_END       True
# WINDOW_START     True
# dtype: bool

Take care with .any(), because it will look for any True values in the entire row/column. I don't know if you need that.

Compare two pandas dataframes for differences

1 Answers1