Comparing NumPy arrays so that NaNs compare equal

Question

Is there an idiomatic way to compare two NumPy arrays that would treat NaNs as being equal to each other (but not equal to anything other than a NaN).

For example, I want the following two arrays to compare equal:

np.array([1.0, np.NAN, 2.0])
np.array([1.0, np.NAN, 2.0])

and the following two arrays to compare unequal:

np.array([1.0, np.NAN, 2.0])
np.array([1.0, 0.0, 2.0])

I am looking for a method that would produce a scalar Boolean outcome.

The following would do it:

np.all((a == b) | (np.isnan(a) & np.isnan(b)))

but it's clunky and creates all those intermediate arrays.

Is there a way that's easier on the eye and makes better use of memory?

P.S. If it helps, the arrays are known to have the same shape and dtype.

@DanielRoseman: I understand that. I've got two methods of producing a NumPy array, and I need to know whether they've produced identical arrays. — NPE, May 30 '12 at 15:51
You've ruled out one answer from [this question](http://stackoverflow.com/q/10710328/577088); are you ruling out the other two as well? — senderle, May 30 '12 at 16:01
@senderle: Thanks for the pointer. That question didn't show up in my search. However, all of those suggestions are either verbose or make very poor use of memory (or both). :-( — NPE, May 30 '12 at 16:05
@aix, I agree :) Just wanted to draw your attention to it. The `testing.assert_equal` approach is almost good, except that it presumably fails if `__debug__` is False! — senderle, May 30 '12 at 16:08
If you're using the current git tip for numpy, there's an [`numpy.isclose` function](https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py#L2039) that takes an `equal_nan` keyword argument (which defaults to `False` for compatibility). It's not terribly memory-friendly, though. — Joe Kington, May 30 '12 at 16:10
If it weren't for numbers which compare equal but have different binary representations (0.0 and -0.0, e.g.) then memoryview(a0) == memoryview(a1) would do it.. — DSM, May 30 '12 at 16:30
@DSM: Thank you for this. It might actually fit the bill for my use case. Would you mind writing it up as an answer? — NPE, May 30 '12 at 16:38
Have you looked at http://stackoverflow.com/questions/10710328/comparing-numpy-arrays-containing-nan/10710390 — JoshAdel, May 30 '12 at 17:20
@JoshAdel: Yes. Please see my earlier comment addressed to senderle. — NPE, May 30 '12 at 17:22
Does this answer your question? [comparing numpy arrays containing NaN](https://stackoverflow.com/questions/10710328/comparing-numpy-arrays-containing-nan) — iacob, Mar 24 '21 at 22:53

score 18 · Accepted Answer · answered May 30 '12 at 17:29

If you really care about memory use (e.g. have very large arrays), then you should use numexpr and the following expression will work for you:

np.all(numexpr.evaluate('(a==b)|((a!=a)&(b!=b))'))

I've tested it on very big arrays with length of 3e8, and the code has the same performance on my machine as

np.all(a==b)

and uses the same amount of memory

score 9 · Answer 2 · answered Nov 02 '16 at 16:37

9

Numpy 1.10 added the equal_nan keyword to np.allclose (https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html).

So you can do now:

In [24]: np.allclose(np.array([1.0, np.NAN, 2.0]), 
                     np.array([1.0, np.NAN, 2.0]), equal_nan=True)
Out[24]: True

answered Nov 02 '16 at 16:37

joris

133,120
36
247
202

This does not work with strings, by the way. Comparing arrays with strings will throw: `TypeError("ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''")` – Ian Dec 19 '18 at 16:49

score 8 · Answer 3 · answered May 30 '12 at 17:18

Disclaimer: I don't recommend this for regular use, and I wouldn't use it myself, but I could imagine rare circumstances under which it might be useful.

If the arrays have the same shape and dtype, you could consider using the low-level memoryview:

>>> import numpy as np
>>> 
>>> a0 = np.array([1.0, np.NAN, 2.0])
>>> ac = a0 * (1+0j)
>>> b0 = np.array([1.0, np.NAN, 2.0])
>>> b1 = np.array([1.0, np.NAN, 2.0, np.NAN])
>>> c0 = np.array([1.0, 0.0, 2.0])
>>> 
>>> memoryview(a0)
<memory at 0x85ba1bc>
>>> memoryview(a0) == memoryview(a0)
True
>>> memoryview(a0) == memoryview(ac) # equal but different dtype
False
>>> memoryview(a0) == memoryview(b0) # hooray!
True
>>> memoryview(a0) == memoryview(b1)
False
>>> memoryview(a0) == memoryview(c0)
False

But beware of subtle problems like this:

>>> zp = np.array([0.0])
>>> zm = -1*zp
>>> zp
array([ 0.])
>>> zm
array([-0.])
>>> zp == zm
array([ True], dtype=bool)
>>> memoryview(zp) == memoryview(zm)
False

which happens because the binary representations differ even though they compare equal (they have to, of course: that's how it knows to print the negative sign)

>>> memoryview(zp)[0]
'\x00\x00\x00\x00\x00\x00\x00\x00'
>>> memoryview(zm)[0]
'\x00\x00\x00\x00\x00\x00\x00\x80'

On the bright side, it short-circuits the way you might hope it would:

In [47]: a0 = np.arange(10**7)*1.0
In [48]: a0[-1] = np.NAN    
In [49]: b0 = np.arange(10**7)*1.0    
In [50]: b0[-1] = np.NAN     
In [51]: timeit memoryview(a0) == memoryview(b0)
10 loops, best of 3: 31.7 ms per loop
In [52]: c0 = np.arange(10**7)*1.0    
In [53]: c0[0] = np.NAN   
In [54]: d0 = np.arange(10**7)*1.0    
In [55]: d0[0] = 0.0    
In [56]: timeit memoryview(c0) == memoryview(d0)
100000 loops, best of 3: 2.51 us per loop

and for comparison:

In [57]: timeit np.all((a0 == b0) | (np.isnan(a0) & np.isnan(b0)))
1 loops, best of 3: 296 ms per loop
In [58]: timeit np.all((c0 == d0) | (np.isnan(c0) & np.isnan(d0)))
1 loops, best of 3: 284 ms per loop

(+1) This is great, thanks for taking the time to write it up. — NPE, May 30 '12 at 17:19
@aix: I've actually needed something similar in the past (equal-considering-nans-equal), though performance and memory weren't issues so I did it manually. Might be worth making a feature request. — DSM, May 30 '12 at 17:27

score 0 · Answer 4 · answered May 30 '12 at 16:42

Not sure this is any better, but a thought...

import numpy
class FloatOrNaN(numpy.float_):
    def __eq__(self, other):
        return (numpy.isnan(self) and numpy.isnan(other)) or super(FloatOrNaN,self).__eq__(other)

a = [1., np.nan, 2.]
one = numpy.array([FloatOrNaN(val) for val in a], dtype=object)
two = numpy.array([FloatOrNaN(val) for val in a], dtype=object)
print one == two   # yields  array([ True,  True,  True], dtype=bool)

This pushes the ugliness into the dtype, at the expense of making the inner workings python instead of c (Cython/etc would fix this). It does, however, greatly reduce memory costs.

Still kinda ugly though :(

Comparing NumPy arrays so that NaNs compare equal

4 Answers4

Linked