2

Does Index numbering matter in testing dataframe equality? I have 2 identical dataframes with exactly the same data and columns. The only difference is that the index numbers for each row is different and equals methods returns a False. How can I get around this? Here are my data frames

   A   B
0  87  54
1  87  75
2  87  22
3  87  69

     A   B
418  87  69
107  87  54
108  87  75
250  87  22
karmanaut
  • 628
  • 1
  • 6
  • 17
  • Possible duplicate of [Pandas DataFrames with NaNs equality comparison](http://stackoverflow.com/questions/19322506/pandas-dataframes-with-nans-equality-comparison) – hellpanderr Oct 19 '15 at 21:00

1 Answers1

2

You can use np.array_equal to check the values, however the ordering is important, so in your example you have to sort by the index first.

In [11]: df1
Out[11]:
    A   B
0  87  54
1  87  75
2  87  22
3  87  69

In [12]: df2
Out[12]:
      A   B
418  87  69
107  87  54
108  87  75
250  87  22

In [13]: df3 = df2.sort()

In [14]: df3
Out[14]:
      A   B
107  87  54
108  87  75
250  87  22
418  87  69

In [15]: np.array_equal(df1, df3)
Out[15]: True

Note: You can't compare df1 and df2 as they have different indexes:

In [21]: df1 == df2
ValueError: Can only compare identically-labeled DataFrame object

You can reset the index, but be aware that an exception can be raised for that reason:

In [22]: df3.reset_index(drop=True)
Out[22]:
    A   B
0  87  54
1  87  75
2  87  22
3  87  69

In [23]: np.all(df1 == df3.reset_index(drop=True))
Out[23]: True

Another option is to have a try and except block around assert_frame_equals:

In [24]: pd.util.testing.assert_frame_equal(df1, df3.reset_index(drop=True))

as in this related answer.

As Jeff points out you can use .equals, which does this:

In [25]: df1.equals(df3.reset_index(drop=True))
Out[25]: True
Community
  • 1
  • 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535