5

In NumPy and Pandas, nan != nan and NaT != NaT. So, when comparing results during unit testing, how can I assert that a returned value is one of those values? A simple assertEqual naturally fails, even if I use pandas.util.testing.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Berislav Lopac
  • 16,656
  • 6
  • 71
  • 80

3 Answers3

7

If you're comparing scalars, one way is to use assertTrue with isnull. For example, in the DataFrame unit tests (pandas/tests/test_frame.py) you can find tests such as this:

self.assertTrue(com.isnull(df.ix['c', 'timestamp']))

(com is an alias for pandas/core/common.py and so com.isnull calls the same underlying function as pd.isnull.)

If on the other hand you're comparing Series or DataFrames with null values for equality, these are handled automatically by tm.assert_series_equal and tm.assert_frame_equal. For example:

>>> import pandas.util.testing as tm
>>> df = pd.DataFrame({'a': [1, np.nan]})
>>> df
    a
0   1
1 NaN

Normally, NaN is not equal to NaN:

>>> df == df
       a
0   True
1  False

But assert_frame_equal processes NaN as being equal to itself:

>>> tm.assert_frame_equal(df, df)
# no AssertionError raised
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
3

Testing on python2.7, I get the following

import numpy as np
import pandas as pd

x = np.nan
x is np.nan #True
x is pd.NaT #False
np.isnan(x) #True
pd.isnull(x) #True

y = pd.NaT
y is np.nan #False
y is pd.NaT #True
np.isnan(y) #TypeError !!
pd.isnull(y) #True

You can also use

x != x #True for nan
y != y #True for NaT

But I don't really like this style, I can never quite convince myself to trust it.

robochat
  • 3,615
  • 3
  • 15
  • 13
2

Before doing an assert_frame_equal check, you could use the .fillna() method on the dataframes to replace the null values with something else that won't otherwise appear in your values. You may also want to read these examples on how to use the .fillna() method.

Alex
  • 2,154
  • 3
  • 26
  • 49
  • 1
    Thank you, this is almost exactly what I have been looking for. I'm saying "almost" as you can't pass `None`, which would be ideal as a type-neutral value, but another unique scalar, such as a zero or a string (e.g. `"INCORRECT!!!1!1!"` ;-) ), is good enough for now. – Berislav Lopac Sep 12 '15 at 08:16
  • @BerislavLopac: perhaps I've misunderstood exactly what you're trying to do, but `assert_frame_equal` already asserts that `NaN` is equal to `NaN`. Using `fillna()` to replace `NaN` with some other scalar to be compared for equality is redundant and so isn't used in Pandas' unit tests. – Alex Riley Sep 12 '15 at 09:22
  • Gah, you're right -- I took your advice too literally and called `fillna` _before_ `assert_frame_check`, so I missed that it works out the differences. Thanks! – Berislav Lopac Sep 12 '15 at 09:26