Python\Numpy: Comparing arrays with NAN

Question

Why are the following two lists not equal?

a = [1.0, np.NAN] 
b = np.append(np.array(1.0), [np.NAN]).tolist()

I am using the following to check for identicalness.

((a == b) | (np.isnan(a) & np.isnan(b))).all(), np.in1d(a,b)

Using np.in1d(a, b) it seems the np.NAN values are not equal but I am not sure why this is. Can anyone shed some light on this issue?

I felt @DSM's answer gave the workaround I was using and hence voted it as the top answer — Black, May 22 '14 at 15:15
I don't see how the question can be considered a duplicated, there might be others similar, but the question linked as duplicate is about IEEE implementation (why nan != nan) and doesn't even mention arrays. — Vincenzooo, Sep 12 '18 at 23:11

Emmet · Answer 1 · 2014-05-22T15:19:42.523

8

NaN values never compare equal. That is, the test NaN==NaN is always False by definition of NaN.

So [1.0, NaN] == [1.0, NaN] is also False. Indeed, once a NaN occurs in any list, it cannot compare equal to any other list, even itself.

If you want to test a variable to see if it's NaN in numpy, you use the numpy.isnan() function. I don't see any obvious way of obtaining the comparison semantics that you seem to want other than by “manually” iterating over the list with a loop.

Consider the following:

import math
import numpy as np

def nan_eq(a, b):
    for i,j in zip(a,b):
        if i!=j and not (math.isnan(i) and math.isnan(j)):
            return False
    return True

a=[1.0, float('nan')]
b=[1.0, float('nan')]

print( float('nan')==float('nan') )
print( a==a )
print( a==b )
print( nan_eq(a,a) )

It will print:

False
True
False
True

The test a==a succeeds because, presumably, Python's idea that references to the same object are equal trumps what would be the result of the element-wise comparison that a==b requires.

edited May 22 '14 at 15:19

answered May 22 '14 at 15:04

Emmet

6,192
26
39

thanks. It is weird because I can use `a = np.array([1.0, np.NAN]), b = np.append(np.array(1.0), [np.NAN])` works with `((a == b) | (np.isnan(a) & np.isnan(b))).all()` – Black May 22 '14 at 15:12
1

Actually the answer isn't entirely correct: a list containing NaN will compare equal to itself (although arguably it shouldn't) in Python. – Zero Piraeus May 22 '14 at 15:14
1

@ZeroPiraeus: yes, indeed it does. I suspect that's because Python “short-cuts” the elementwise comparison and just returns `True` if you compare a list to itself, and that trumps the “correct” result. It seems like a reasonable optimization: no point in slowing every list compare-to-self down on the off-chance that there might be a NaN in there. – Emmet May 22 '14 at 15:23
You can get a slightly faster version of `math.isnan(x)` by doing ` x != x`, which is how NumPy does this in the source C code, see e.g. [here](https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arraytypes.c.src#L2456). So in pure Python you can do `all(j == k or (j != j and k != k) for j, k in zip(a, b))` to check for equality. – Jaime May 22 '14 at 17:44
1

`a = [1, np.nan]; b = [1, np.nan]; a == b` returns True still. I cannot find why is that, though. – Davidmh May 22 '14 at 20:00
I was going to comment same thing. Lists seems to be smart enough with nan comparison, while arrays are not (shame for them). – Vincenzooo Sep 11 '18 at 21:13

score 5 · Accepted Answer · answered May 22 '14 at 15:12

Since a and b are lists, a == b isn't returning an array, and so your numpy-like logic won't work:

>>> a == b
False

The command you've quoted only works if they're arrays:

>>> a,b = np.asarray(a), np.asarray(b)
>>> a == b
array([ True, False], dtype=bool)
>>> (a == b) | (np.isnan(a) & np.isnan(b))
array([ True,  True], dtype=bool)
>>> ((a == b) | (np.isnan(a) & np.isnan(b))).all()
True

which should work to compare two arrays (either they're both equal or they're both NaN).

thanks just realised that. I'll attempt a workaround. – Black May 22 '14 at 15:13 — Black, May 22 '14 at 15:13

score 0 · Answer 3 · answered May 22 '14 at 15:13

NaNs are implemented in python (and numpy) according to IEEE 754 (see http://en.wikipedia.org/wiki/NaN), and defined as unorderable. In practice, this means that a NaN never returns True on an ordered compare operation, e.g. <, >, ==, etc. numpy and the buillting math module provide isnan functions to determine if a value is NaN.

Python\Numpy: Comparing arrays with NAN

3 Answers3