1

I was wanting to compare two dfs and ran into this :

df = pd.DataFrame([{'a':1,'b':2},{'a':3,'b':4}])
df2 = pd.DataFrame([{'a':0,'b':2},{'a':3,'b':4}])

The element-by-element comparison works as I would have thought:

df == df2 
Out[52]:
       a     b
0  False  True
1   True  True

But all(df) is puzzling me :

all(df==df2)
Out[53]: True

while

(df==df2).all()
Out[54]:
a    False
b     True
dtype: bool
jeremy_rutman
  • 3,552
  • 4
  • 28
  • 47
  • 3
    When you use `all(df)`, you're using the built-in `all()` method which takes an iterable and make sure every item is `True`. On the other hand, when you use `(df==df2).all()`, you are using the `pandas.DataFrame.all()` method which checks each axis. The default axis is `0`. That's why you got a `Pandas.Series` – Anwarvic May 15 '20 at 11:57

1 Answers1

0

If you apply python's all function to df (which is an iterable) it returns True if all column names(keys in case of dict) are True(!= 0). The column name in both your datasets are string (!= 0) which are always translated as True so comparison between them are True. But if you change your dataframe definition as follow :

df = pd.DataFrame([{0:1,'b':2},{0:3,'b':4}])
df2 = pd.DataFrame([{0:1,'b':2},{0:3,'b':4}])

df1 = df==df2
print(all(df1))

however all values are equal but it throws False. Also for dict the following is False :

mydict = {0 : "Apple", 1 : "Orange"}
print(all(mydict))

But if we modify the dict :

mydict = {2 : "Apple", 1 : "Orange"}
print(all(mydict))

the result becomes True

Ehsan
  • 711
  • 2
  • 7
  • 21