Comparing numpy arrays containing NaN

Question

For my unittest, I want to check if two arrays are identical. Reduced example:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])

if np.all(a==b):
    print 'arrays are equal'

This does not work because nan != nan. What is the best way to proceed?

score 61 · Answer 1 · edited Jan 09 '23 at 08:26

61

For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:

>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True

However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.

This was first pointed out by flyingdutchman; see his answer below for details.

edited Jan 09 '23 at 08:26

Friedrich -- Слава Україні

2,901
1
21
40

answered May 22 '12 at 21:24

senderle

145,869
36
209
233

+1 This solution seems to be a bit faster than the solution I posted with masked arrays, although if you were creating the mask for use in other parts of your code, the overhead from creating the mask would become less of a factor in the overall efficiency of the ma strategy. – JoshAdel May 22 '12 at 21:34
Thanks. Your solution works indeed, but I prefer the built-in test in numpy as suggested by Avaris – saroele May 23 '12 at 22:43
1

I really like the simplicity of this. Also, it seems a faster than @Avaris solution. Turning this into a lambdafunction, testing with Ipython's `%timeit` yields 23.7 µs vs 1.01 ms. – AllanLRH Mar 02 '14 at 14:25
@NovicePhysicist, interesting timing! I wonder if it has to do with the use of exception handling. Did you test positive vs. negative results? The speed will probably vary significantly depending on whether the exception is thrown or not. – senderle Mar 02 '14 at 15:10
Nope, just did a simple test, with some broadcasting relevant to my problem at hand (compared 2D array with 1D vector – so I guess it was row-wise comparison). But I guess that one could pretty easyli do a lot of testing in the Ipython notebook. Also, I used a lambda function for your solution, but I think it should be a little bit faster, had I used a regular function (often seems to be the case). – AllanLRH Mar 02 '14 at 16:46

score 50 · Accepted Answer · answered May 22 '12 at 21:42

50

Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:

In : import numpy as np

In : def nan_equal(a,b):
...:     try:
...:         np.testing.assert_equal(a,b)
...:     except AssertionError:
...:         return False
...:     return True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([1, 2, np.NaN])

In : nan_equal(a,b)
Out: True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([3, 2, np.NaN])

In : nan_equal(a,b)
Out: False

Edit

Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

answered May 22 '12 at 21:42

Avaris

35,883
7
81
72

Excellent, this is the most elegant and built-in solution. I just added `np.testing.assert_equal(a,b)` in my unittest, and if it raises the exception, the test fails (no error), and I even get a nice print with the differences and the mismatch. Thanks. – saroele May 23 '12 at 22:42
4

Please note that this solution works because `numpy.testing.assert_*` do not follow the same semantics of python `assert`'s. In plain Python `AssertionError` exceptions are raised iff `__debug__ is True` i.e. if the script is run un-optimized (no -O flag), see the [docs](http://docs.python.org/3.3/reference/simple_stmts.html#grammar-token-assert_stmt). For this reason I would strongly discourage wrapping `AssertionErrors` for flow control. Of course, since we are in a test suite the best solution is to leave the numpy.testing.assert alone. – Stefano M Jun 14 '13 at 10:14
The documentation of `numpy.testing.assert_equal()` does not explicitly indicates that it considers that NaN equals NaN (whereas `numpy.testing.assert_array_equal()` does): it this documented somewhere else? – Eric O. Lebigot Aug 08 '18 at 13:57
@EricOLebigot Does numpy.testing.assert_equal() rely consider `nan = nan`? I'm getting an `AssertionError: Arrays are not equal` even if the arrays are identical including the dtype. – thinwybk Jul 07 '20 at 09:15
Both the _current_ official documentation and the examples above show that it does consider that NaN == NaN. I am thinking that the best is for you to ask a new StackOverflow question with the details. – Eric O. Lebigot Jul 21 '20 at 14:43

score 48 · Answer 3 · edited Mar 18 '21 at 04:59

48

The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:

a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])

if np.allclose(a, b, equal_nan=True):
    print('arrays are equal')

Then arrays are equal will be printed.

You can find here the related documentation

edited Mar 18 '21 at 04:59

Oren

4,711
4
37
63

answered Aug 14 '17 at 13:05

Luis Alberto Centeno

478
4
6

2

+1 because your solution doesn't reinvent the wheel. However, this only works with numbers-like items. Otherwise, you get the nasty `TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''` – MLguy Mar 06 '18 at 14:51
This is a great answer in many contexts! It's worth adding the caveat that this will return true even if the arrays aren't strictly equal. Much of the time it won't matter though. – senderle Dec 10 '18 at 23:41
1

+1, since this returns a `bool` instead of raising an `AssertionError`. I needed this for implementing an `__eq__(...)` of an class with an array attribute. – Bas Swinckels May 24 '20 at 17:47
2

Just as a pointer to a later answer: https://stackoverflow.com/a/58709110/1207489. Add `rtol=0, atol=0` to avoid the issue that it considers close values equal (as mentioned by @senderle). So: `np.allclose(a, b, equal_nan=True, rtol=0, atol=0)`. – Claude Jan 07 '21 at 15:12

score 16 · Answer 4 · edited Mar 26 '22 at 00:35

16

The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19. The example would look as follows:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)

But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.

edited Mar 26 '22 at 00:35

wjandrea

28,235
9
60
81

answered Dec 09 '20 at 14:57

flyingdutchman

1,197
11
17

score 10 · Answer 5 · edited Mar 26 '22 at 00:28

10

You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:

For example:

a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True

edited Mar 26 '22 at 00:28

wjandrea

28,235
9
60
81

answered May 22 '12 at 21:23

JoshAdel

66,734
27
141
140

2

thanks for making me aware of the use of masked arrays. I prefer the solution of Avaris however. – saroele May 23 '12 at 22:43
You should use `np.ma.masked_where(np.isnan(a), a)` else you fail to compare infinite values. – John Zwinck Sep 24 '14 at 02:33
5

I tested with `a=np.array([1, 2, np.NaN])` and `b=np.array([1, np.NaN, 2])` which are clearly not equal and `np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b))` still returns True, so be aware of that if you use this method. – tavo Jan 05 '17 at 14:41
1

This method only tests whether the two arrays without the NaN values are the same, but does NOT test if NaNs occurred in the same places... Can be dangerous to use. – WillZ May 29 '19 at 22:04
It can be dangerous to use, that is a valid point. However... this is the only solution that works for me out of all suggestions mentioned herein. This is a nice approach if you are looking to compare data that may be masked differently but otherwise contain generally identical information. – jpolly Nov 21 '22 at 17:13

score 8 · Answer 6 · answered Nov 05 '19 at 10:13

Just to complete @Luis Albert Centeno’s answer, you may rather use:

np.allclose(a, b, rtol=0, atol=0, equal_nan=True)

rtol and atol control the tolerance of the equality test. In short, allclose() returns:

all(abs(a - b) <= atol + rtol * abs(b))

By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.

PS: "I want to check if two arrays are identical " >> Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)

You’d test identity via keyword is:

a is b

Matheus Araujo · Answer 7 · 2017-02-16T09:47:17.777

7

When I used the above answer:

 ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()

It gave me some erros when evaluate list of strings.

This is more type generic:

def EQUAL(a,b):
    return ((a == b) | ((a != a) & (b != b)))

edited Feb 16 '17 at 09:47

answered Oct 10 '16 at 12:07

Matheus Araujo

5,551
2
22
23

score 2 · Answer 8 · edited Mar 26 '22 at 00:36

2

As of v1.19, numpy's array_equal function supports an equal_nan argument:

assert np.array_equal(a, b, equal_nan=True)

edited Mar 26 '22 at 00:36

wjandrea

28,235
9
60
81

answered Mar 24 '21 at 22:58

iacob

20,084
6
92
119

[flyingdutchman already posted this](/a/65219253/4518341). I just added the version number for completeness. (and fixed the version number in your answer btw) – wjandrea Mar 26 '22 at 00:37

score 0 · Answer 9 · answered May 02 '21 at 22:47

For me this worked fine:

a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where = 
    numpy.logical_not(numpy.logical_or(
        numpy.isnan(a), 
        numpy.isnan(b)
    ))
).all()

PS. Ignores comparison when there's a nan

score -1 · Answer 10 · answered Jan 16 '19 at 23:08

-1

If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:

a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)

Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

answered Jan 16 '19 at 23:08

NeuronQ

7,527
9
42
60

This doesn't actually work for numerics. For example, try setting `a` and `b` to `np.array([1, np.nan])`. – wjandrea Mar 26 '22 at 00:47

Comparing numpy arrays containing NaN

10 Answers10

Linked

Related