3

I have noticed a quirky thing. Let's say A and B are dataframe.

A is:

A
   a  b  c
0  x  1  a
1  y  2  b
2  z  3  c
3  w  4  d

B is:

B
   a  b  c
0  1  x  a
1  2  y  b
2  3  z  c
3  4  w  d

As we can see above, the elements under column a in A and B are different, but A.equals(B) yields True

A==B correctly shows that the elements are not equal:

A==B
       a      b     c
0  False  False  True
1  False  False  True
2  False  False  True
3  False  False  True

Question: Can someone please explain why .equals() yields True? Also, I researched this topic on SO. As per contract of pandas.DataFrame.equals, Pandas must return False. I'd appreciate any help.

I am a beginner, so I'd appreciate any help.


Here's json format and ._data of A and B

A

`A.to_json()`
Out[114]: '{"a":{"0":"x","1":"y","2":"z","3":"w"},"b":{"0":1,"1":2,"2":3,"3":4},"c":{"0":"a","1":"b","2":"c","3":"d"}}'

and A._data is

BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(1, 2, 1), 1 x 4, dtype: int64
ObjectBlock: slice(0, 4, 2), 2 x 4, dtype: object

B

B's json format:

B.to_json()
'{"a":{"0":1,"1":2,"2":3,"3":4},"b":{"0":"x","1":"y","2":"z","3":"w"},"c":{"0":"a","1":"b","2":"c","3":"d"}}'


B._data
BlockManager
Items: Index(['a', 'b', 'c'], dtype='object')
Axis 1: RangeIndex(start=0, stop=4, step=1)
IntBlock: slice(0, 1, 1), 1 x 4, dtype: int64
ObjectBlock: slice(1, 3, 1), 2 x 4, dtype: object
watchtower
  • 4,140
  • 14
  • 50
  • 92
  • See also: https://stackoverflow.com/questions/38212697/confirming-equality-of-two-pandas-dataframes/38213972#38213972 – Alexander Sep 19 '18 at 04:10

3 Answers3

2

Alternative to sacul and U9-Forward's answers, I've done some further analysis and it looks like the reason you are seeing True and not False as you expected might have something more to do with this line of the docs:

This function requires that the elements have the same dtype as their respective elements in the other Series or DataFrame.

dataframes

With the above dataframes, when I run df.equals(), this is what is returned:

>>> A.equals(B)
Out: True
>>> B.equals(C)
Out: False

These two align with what the other answers are saying, A and B are the same shape and have the same elements, so they are the same. While B and C have the same shape, but different elements, so they aren't the same.

On the other hand:

>>> A.equals(D)
Out: False

Here A and D have the same shape, and the same elements. But still they are returning false. The difference between this case and the one above is that all of the dtypes in the comparison match up, as it says the above docs quote. A and D both have the dtypes: str, int, str.

girlvsdata
  • 1,596
  • 11
  • 21
  • 1
    For further clarity: I am saying that the comparison of A and B should return False, but the function is not working correctly because it needs to compare like with like `dtypes`, so it is erroneously returning True. A and D have matching dtypes: str, int, str - so their comparison is working correctly and returning False (as the two string columns are switched) – girlvsdata Sep 19 '18 at 04:52
1

From the docs:

Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

Determines if two NDFrame objects contain the same elements!!!

ELEMNTS not including COLUMNS

So that's why returns True

If you want it to return false and check the columns do:

print((A==B).all().all())

Output:

False
U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

As in the answer you linked in your question, essentially the behaviour of pandas.DataFrame.equals mimics numpy.array_equal. The docs for np.array_equal state that it returns:

True if two arrays have the same shape and elements, False otherwise.

Which your 2 dataframes satisfies.

sacuL
  • 49,704
  • 8
  • 81
  • 106
  • 1
    Thanks sacul. I checked docs at `https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.equals.html`. All I could find is that `Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.` Could you please point me to the right direction? Also, what's the best way to check for the equality? – watchtower Sep 19 '18 at 04:09
  • 2
    @watchtower it is: `(A==B).all().all()` – U13-Forward Sep 19 '18 at 04:11
  • 1
    You can also check out the docs I linked to `numpy.array_equal`, which explains it differently, but is essentially the same function. As for how to check equality, @U9-Forward's comment makes sense (+1) – sacuL Sep 19 '18 at 04:19