98

For my unittest, I want to check if two arrays are identical. Reduced example:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])

if np.all(a==b):
    print 'arrays are equal'

This does not work because nan != nan. What is the best way to proceed?

iacob
  • 20,084
  • 6
  • 92
  • 119
saroele
  • 9,481
  • 10
  • 29
  • 39

10 Answers10

61

For versions of numpy prior to 1.19, this is probably the best approach in situations that don't specifically involve unit tests:

>>> ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
True

However, modern versions provide the array_equal function with a new keyword argument, equal_nan, which fits the bill exactly.

This was first pointed out by flyingdutchman; see his answer below for details.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • +1 This solution seems to be a bit faster than the solution I posted with masked arrays, although if you were creating the mask for use in other parts of your code, the overhead from creating the mask would become less of a factor in the overall efficiency of the ma strategy. – JoshAdel May 22 '12 at 21:34
  • Thanks. Your solution works indeed, but I prefer the built-in test in numpy as suggested by Avaris – saroele May 23 '12 at 22:43
  • 1
    I really like the simplicity of this. Also, it seems a faster than @Avaris solution. Turning this into a lambdafunction, testing with Ipython's `%timeit` yields 23.7 µs vs 1.01 ms. – AllanLRH Mar 02 '14 at 14:25
  • @NovicePhysicist, interesting timing! I wonder if it has to do with the use of exception handling. Did you test positive vs. negative results? The speed will probably vary significantly depending on whether the exception is thrown or not. – senderle Mar 02 '14 at 15:10
  • Nope, just did a simple test, with some broadcasting relevant to my problem at hand (compared 2D array with 1D vector – so I guess it was row-wise comparison). But I guess that one could pretty easyli do a lot of testing in the Ipython notebook. Also, I used a lambda function for your solution, but I think it should be a little bit faster, had I used a regular function (often seems to be the case). – AllanLRH Mar 02 '14 at 16:46
50

Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:

In : import numpy as np

In : def nan_equal(a,b):
...:     try:
...:         np.testing.assert_equal(a,b)
...:     except AssertionError:
...:         return False
...:     return True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([1, 2, np.NaN])

In : nan_equal(a,b)
Out: True

In : a=np.array([1, 2, np.NaN])

In : b=np.array([3, 2, np.NaN])

In : nan_equal(a,b)
Out: False

Edit

Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

Avaris
  • 35,883
  • 7
  • 81
  • 72
  • Excellent, this is the most elegant and built-in solution. I just added `np.testing.assert_equal(a,b)` in my unittest, and if it raises the exception, the test fails (no error), and I even get a nice print with the differences and the mismatch. Thanks. – saroele May 23 '12 at 22:42
  • 4
    Please note that this solution works because `numpy.testing.assert_*` do not follow the same semantics of python `assert`'s. In plain Python `AssertionError` exceptions are raised iff `__debug__ is True` i.e. if the script is run un-optimized (no -O flag), see the [docs](http://docs.python.org/3.3/reference/simple_stmts.html#grammar-token-assert_stmt). For this reason I would strongly discourage wrapping `AssertionErrors` for flow control. Of course, since we are in a test suite the best solution is to leave the numpy.testing.assert alone. – Stefano M Jun 14 '13 at 10:14
  • The documentation of `numpy.testing.assert_equal()` does not explicitly indicates that it considers that NaN equals NaN (whereas `numpy.testing.assert_array_equal()` does): it this documented somewhere else? – Eric O. Lebigot Aug 08 '18 at 13:57
  • @EricOLebigot Does numpy.testing.assert_equal() rely consider `nan = nan`? I'm getting an `AssertionError: Arrays are not equal` even if the arrays are identical including the dtype. – thinwybk Jul 07 '20 at 09:15
  • Both the _current_ official documentation and the examples above show that it does consider that NaN == NaN. I am thinking that the best is for you to ask a new StackOverflow question with the details. – Eric O. Lebigot Jul 21 '20 at 14:43
48

The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:

a = np.array([1, 2, np.nan])
b = np.array([1, 2, np.nan])

if np.allclose(a, b, equal_nan=True):
    print('arrays are equal')

Then arrays are equal will be printed.

You can find here the related documentation

Oren
  • 4,711
  • 4
  • 37
  • 63
  • 2
    +1 because your solution doesn't reinvent the wheel. However, this only works with numbers-like items. Otherwise, you get the nasty `TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''` – MLguy Mar 06 '18 at 14:51
  • This is a great answer in many contexts! It's worth adding the caveat that this will return true even if the arrays aren't strictly equal. Much of the time it won't matter though. – senderle Dec 10 '18 at 23:41
  • 1
    +1, since this returns a `bool` instead of raising an `AssertionError`. I needed this for implementing an `__eq__(...)` of an class with an array attribute. – Bas Swinckels May 24 '20 at 17:47
  • 2
    Just as a pointer to a later answer: https://stackoverflow.com/a/58709110/1207489. Add `rtol=0, atol=0` to avoid the issue that it considers close values equal (as mentioned by @senderle). So: `np.allclose(a, b, equal_nan=True, rtol=0, atol=0)`. – Claude Jan 07 '21 at 15:12
16

The numpy function array_equal fits the question's requirements perfectly with the equal_nan parameter added in 1.19. The example would look as follows:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
assert np.array_equal(a, b, equal_nan=True)

But be aware of the problem that this won't work if an element is of dtype object. Not sure if this is a bug or not.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
flyingdutchman
  • 1,197
  • 11
  • 17
10

You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:

For example:

a=np.array([1, 2, np.NaN])
b=np.array([1, 2, np.NaN])
np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
wjandrea
  • 28,235
  • 9
  • 60
  • 81
JoshAdel
  • 66,734
  • 27
  • 141
  • 140
  • 2
    thanks for making me aware of the use of masked arrays. I prefer the solution of Avaris however. – saroele May 23 '12 at 22:43
  • You should use `np.ma.masked_where(np.isnan(a), a)` else you fail to compare infinite values. – John Zwinck Sep 24 '14 at 02:33
  • 5
    I tested with `a=np.array([1, 2, np.NaN])` and `b=np.array([1, np.NaN, 2])` which are clearly not equal and `np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b))` still returns True, so be aware of that if you use this method. – tavo Jan 05 '17 at 14:41
  • 1
    This method only tests whether the two arrays without the NaN values are the same, but does NOT test if NaNs occurred in the same places... Can be dangerous to use. – WillZ May 29 '19 at 22:04
  • It can be dangerous to use, that is a valid point. However... this is the only solution that works for me out of all suggestions mentioned herein. This is a nice approach if you are looking to compare data that may be masked differently but otherwise contain generally identical information. – jpolly Nov 21 '22 at 17:13
8

Just to complete @Luis Albert Centeno’s answer, you may rather use:

np.allclose(a, b, rtol=0, atol=0, equal_nan=True)

rtol and atol control the tolerance of the equality test. In short, allclose() returns:

all(abs(a - b) <= atol + rtol * abs(b))

By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.


PS: "I want to check if two arrays are identical " >> Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)

You’d test identity via keyword is:

a is b
Alexandre Huat
  • 806
  • 10
  • 16
7

When I used the above answer:

 ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()

It gave me some erros when evaluate list of strings.

This is more type generic:

def EQUAL(a,b):
    return ((a == b) | ((a != a) & (b != b)))
Matheus Araujo
  • 5,551
  • 2
  • 22
  • 23
2

As of v1.19, numpy's array_equal function supports an equal_nan argument:

assert np.array_equal(a, b, equal_nan=True)
wjandrea
  • 28,235
  • 9
  • 60
  • 81
iacob
  • 20,084
  • 6
  • 92
  • 119
  • [flyingdutchman already posted this](/a/65219253/4518341). I just added the version number for completeness. (and fixed the version number in your answer btw) – wjandrea Mar 26 '22 at 00:37
0

For me this worked fine:

a = numpy.array(float('nan'), 1, 2)
b = numpy.array(2, float('nan'), 2)
numpy.equal(a, b, where = 
    numpy.logical_not(numpy.logical_or(
        numpy.isnan(a), 
        numpy.isnan(b)
    ))
).all()

PS. Ignores comparison when there's a nan

camposer
  • 5,152
  • 2
  • 17
  • 15
-1

If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:

a = np.array(['a', 'b', None])
b = np.array(['a', 'b', None])
assert list(a) == list(b)

Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

NeuronQ
  • 7,527
  • 9
  • 42
  • 60
  • This doesn't actually work for numerics. For example, try setting `a` and `b` to `np.array([1, np.nan])`. – wjandrea Mar 26 '22 at 00:47