First, at least in NumPy 1.15, np.nan
happens to be a special singleton, meaning that whenever NumPy has to give you a NaN value of type float
, it tries to give you the same np.nan
value.
But this is not documented anywhere, or guaranteed to be true across versions.
This fits into the larger class of values that may or may not be singletons, as an implementation detail.
As a general rule, if your code relies on two equal values of an immutable type being identical or not being identical, your code is wrong.
Here are some examples from a default build of CPython 3.7:
>>> a, b = 200, 201
>>> a is b-1
True
>>> a, b = 300, 301
>>> a is b-1
False
>>> 301-1 is 300
True
>>> math.nan is math.nan
True
>>> float('nan') is math.nan
False
>>> float('nan') is float('nan')
False
You can learn all of the rules that make all of these things come out that way, but they could all change in a different Python implementation, or in version 3.8, or even in 3.7 built with custom configure options. So, just never 1
or math.nan
or np.nan
or ''
with is
; only use it for objects that are specifically documented to be singletons (like None
—or instances of your own types, of course).
Second, when you index a numpy array, it has to "unbox" the value by constructing a scalar, of a type appropriate to the array's dtype
. For a dtype=float64
array, the scalar value it constructs is a np.float64
.
So, a[2]
is guaranteed to be a np.float64
.
But np.nan
is not an np.float64
, it's a float
.
So, there's no way NumPy can give you np.nan
when you ask for a[2]
. Instead, it gives you an np.float64
with a NaN value.
OK, so that's why a[2] is np.nan
is always False. But why is a[2] is a[2]
also usually false?
As I mentioned above, NumPy tries to give you np.nan
whenever it needs to give you a float
NaN. But—at least in 1.15—it doesn't have any special singleton value to provide whenever it needs to give you a np.float64
NaN. There's no reason it couldn't, but nobody bothered to write such code, because it shouldn't matter either way to any properly-written app.
So, each time you unbox the value in a[2]
into a scalar np.float64
, it gives you a new NaN-valued np.float64
.
But why isn't this the same as 301-1 is 300
? Well, the reason that works is that the compiler is allowed to fold constants of known immutable type with equal values, and CPython does exactly that, for simple cases, within each compilation unit. But two NaN values aren't equal; a NaN value isn't even equal to itself. So, it can't be constant-folded.
(If you're wondering what happens if you create an array with an int dtype and store small values in it and check whether they get merged into the small-int singletons—try it and see.)
And of course this is why isnan
exists in the first place. You can't test for NaN with equality (because NaN values are not equal to anything, even themselves), you can't test for NaN with identity (for all of the reasons described above), so you need a function to test for them.