For clarity I will extract an excerpt from my code and use general names. I have a class Foo()
that stores a DataFrame to an attribute.
import pandas as pd
import pandas.util.testing as pdt
class Foo():
def __init__(self, bar):
self.bar = bar # dict of dicts
self.df = pd.DataFrame(bar) # pandas object
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.__dict__ == other.__dict__
return NotImplemented
def __ne__(self, other):
result = self.__eq__(other)
if result is NotImplemented:
return result
return not result
However, when I try to compare two instances of Foo
, I get an excepetion related to the ambiguity of comparing two DataFrames (the comparison should work fine without the 'df' key in Foo.__dict__
).
d1 = {'A' : pd.Series([1, 2], index=['a', 'b']),
'B' : pd.Series([1, 2], index=['a', 'b'])}
d2 = d1.copy()
foo1 = Foo(d1)
foo2 = Foo(d2)
foo1.bar # dict
foo1.df # pandas DataFrame
foo1 == foo2 # ValueError
[Out] ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Fortunately, pandas has utility functions for asserting whether two DataFrames or Series are true. I'd like to use this function's comparison operation if possible.
pdt.assert_frame_equal(pd.DataFrame(d1), pd.DataFrame(d2)) # no raises
There are a few options to resolve the comparison of two Foo
instances:
- compare a copy of
__dict__
, wherenew_dict
lacks the df key - delete the df key from
__dict__
(not ideal) - don't compare
__dict__
, but only parts of it contained in a tuple - overload the
__eq__
to facilitate pandas DataFrame comparisons
The last option seems the most robust in the long-run, but I am not sure of the best approach. In the end, I would like to refactor __eq__
to compare all items from Foo.__dict__
, including DataFrames (and Series). Any ideas on how to accomplish this?