28

How to assert that the following two dataframes df1 and df2 are equal?

import pandas as pd
df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1.0, 2, 3])

The output of df1.equals(df2) is False. As of now, I know two ways:

print (df1 == df2).all()[0]

or

df1 = df1.astype(float)
print df1.equals(df2)

It seems a little bit messy. Is there a better way to do this comparison?

ayhan
  • 70,170
  • 20
  • 182
  • 203
Mehdi Jafarnia Jahromi
  • 2,017
  • 3
  • 15
  • 14

2 Answers2

40

You can use assert_frame_equal and not check the dtype of the columns.

# Pre v. 0.20.3
# from pandas.util.testing import assert_frame_equal

from pandas.testing import assert_frame_equal

assert_frame_equal(df1, df2, check_dtype=False)
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • 9
    With pandas 0.20.3 `assert_frame_equal` is in the `pandas.testing` package: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.testing.assert_frame_equal.html – Matthew Turner Sep 08 '17 at 21:04
  • And important to notice, if there is no any output after the execution of assert_frame_equal function, then the two dataframes are equal. – Nabin Nov 01 '19 at 11:33
6

Using elegant @Divakar's idea - numpy's allclose() will do the main trick for numbers:

In [128]: df1
Out[128]:
   0    s  n
0  1  aaa  1
1  2  aaa  2
2  3  aaa  3

In [129]: df2
Out[129]:
     0    s    n
0  1.0  aaa  1.0
1  2.0  aaa  2.0
2  3.0  aaa  3.0

In [130]: (np.allclose(df1.select_dtypes(exclude=[object]), df2.select_dtypes(exclude=[object]))
   .....:  &
   .....:  df1.select_dtypes(include=[object]).equals(df2.select_dtypes(include=[object]))
   .....: )
Out[130]: True

select_dtypes() will help you to separate strings and all other numeric dtypes

Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419