Detect if a NumPy array contains at least one non-numeric value?

Question

I need to write a function which will detect if the input contains at least one value which is non-numeric. If a non-numeric value is found I will raise an error (because the calculation should only return a numeric value). The number of dimensions of the input array is not known in advance - the function should give the correct value regardless of ndim. As an extra complication the input could be a single float or numpy.float64 or even something oddball like a zero-dimensional array.

The obvious way to solve this is to write a recursive function which iterates over every iterable object in the array until it finds a non-iterabe. It will apply the numpy.isnan() function over every non-iterable object. If at least one non-numeric value is found then the function will return False immediately. Otherwise if all the values in the iterable are numeric it will eventually return True.

That works just fine, but it's pretty slow and I expect that NumPy has a much better way to do it. What is an alternative that is faster and more numpyish?

Here's my mockup:

def contains_nan( myarray ):
    """
    @param myarray : An n-dimensional array or a single float
    @type myarray : numpy.ndarray, numpy.array, float
    @returns: bool
    Returns true if myarray is numeric or only contains numeric values.
    Returns false if at least one non-numeric value exists
    Not-A-Number is given by the numpy.isnan() function.
    """
    return True

Your description for `contains_nan` looks suspicious: "Returns false if at least one non-numeric value exists". I would have expected `contains_nan` to return `True` if the array contains NaN. — Samuel Tardieu, May 26 '09 at 18:00
What about inputs such as `array(['None', 'None'], dtype=object)`? Should such an input just raise an exception? — Finn Årup Nielsen, Jun 08 '15 at 14:47

score 273 · Accepted Answer · edited Dec 10 '10 at 11:22

273

This should be faster than iterating and will work regardless of shape.

numpy.isnan(myarray).any()

Edit: 30x faster:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

Results:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

Bonus: it works fine for non-array NumPy types:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True

edited Dec 10 '10 at 11:22

Peter Mortensen

30,738
21
105
131

answered May 27 '09 at 00:55

Paul

42,322
15
106
123

1

with numpy 1.7 the flatten() version is only twice as fast as the first one – Christian Geier Oct 09 '13 at 14:39
Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
3

@CharlieParker the same reason why float('nan') == float('nan') will return False. NaN doesn't equal NaN. Here more info: http://stackoverflow.com/questions/10034149/why-is-nan-not-equal-to-nan – Muppet Feb 06 '17 at 23:20
1

@mab: That's because calling `numpy.any` on a genexp just returns the genexp; you're not actually doing the computation you think you are. Never call `numpy.any` on a genexp. – user2357112 May 11 '17 at 20:58
In real debugging scenario, I would also recommend looking at [`np.isfinite`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isfinite.html#numpy.isfinite) instead of `np.isnan` to detect numeric overflows, instability, etc. – Ben Usman May 12 '17 at 01:58
2

Out of interest -- it this really the fastest way to do this? i) Doesn't `numpy.isnan(a).any()` involve allocating a large temporary array (it is it a view). ii) If the first element is NAN, does this solution involve iterating over the full array? If I set the first element to NAN, this still takes about 5microseconds, which seems quite slow for what can be done with an array lookup and a test -- should be nanoseconds, no? – user48956 Oct 04 '17 at 18:14

score 28 · Answer 2 · edited Jun 01 '21 at 12:20

28

If infinity is a possible value, I would use numpy.isfinite

numpy.isfinite(myarray).all()

If the above evaluates to True, then myarray contains none of numpy.nan, numpy.inf or -numpy.inf.

numpy.isnan will be OK with numpy.inf values, for example:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)

edited Jun 01 '21 at 12:20

Matthew Mage

395
5
18

answered Oct 09 '15 at 17:13

Akavall

82,592
51
207
251

Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
1

@CharlieParker because two `nan`s are not considered equal to each other. Try `float('nan') == float('nan')`. – Akavall Oct 13 '16 at 22:06
interesting. Why are they not considered equal? – Charlie Parker Oct 13 '16 at 22:07
1

@CharlieParker, I don't think I could give a very good answer here. Maybe this is what you are looking for: http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values – Akavall Oct 13 '16 at 22:11

user48956 · Answer 3 · 2019-07-17T21:00:57.837

Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Note that the accepted answer:

iterates over the whole data, regardless of whether a nan is found
creates a temporary array of size N, which is redundant.

A better solution is to return True immediately when NAN is found:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

and works for n-dimensions:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

Compare this to the numpy native solution:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.

score 3 · Answer 4 · edited May 12 '17 at 01:55

3

(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

Example:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"

edited May 12 '17 at 01:55

lmiguelvargasf

63,191
45
217
228

answered May 11 '17 at 20:52

Ting On Chan

121
2

score 2 · Answer 5 · 2009-08-25T00:12:56.343

2

With numpy 1.3 or svn you can do this

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

The treatment of nans in comparisons was not consistent in earlier versions.

edited Aug 25 '09 at 00:12

answered Aug 25 '09 at 00:04

Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
@CharlieParker ... because comparison with NAN doesn't do what you expect. NAN is treated like a logical NULL (=don't know). `float("nan")==float("nan")` give `False` (though feasibly it should probably return NAN or None). Similarly oddness with NAN and boolen NULL is true in many languages, including SQL (where NULL=NULL is never true). – user48956 Oct 04 '17 at 18:08

Detect if a NumPy array contains at least one non-numeric value?

5 Answers5