159

I need to write a function which will detect if the input contains at least one value which is non-numeric. If a non-numeric value is found I will raise an error (because the calculation should only return a numeric value). The number of dimensions of the input array is not known in advance - the function should give the correct value regardless of ndim. As an extra complication the input could be a single float or numpy.float64 or even something oddball like a zero-dimensional array.

The obvious way to solve this is to write a recursive function which iterates over every iterable object in the array until it finds a non-iterabe. It will apply the numpy.isnan() function over every non-iterable object. If at least one non-numeric value is found then the function will return False immediately. Otherwise if all the values in the iterable are numeric it will eventually return True.

That works just fine, but it's pretty slow and I expect that NumPy has a much better way to do it. What is an alternative that is faster and more numpyish?

Here's my mockup:

def contains_nan( myarray ):
    """
    @param myarray : An n-dimensional array or a single float
    @type myarray : numpy.ndarray, numpy.array, float
    @returns: bool
    Returns true if myarray is numeric or only contains numeric values.
    Returns false if at least one non-numeric value exists
    Not-A-Number is given by the numpy.isnan() function.
    """
    return True
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Salim Fadhley
  • 22,020
  • 23
  • 75
  • 102

5 Answers5

273

This should be faster than iterating and will work regardless of shape.

numpy.isnan(myarray).any()

Edit: 30x faster:

import timeit
s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
ms = [
    'numpy.isnan(a).any()',
    'any(numpy.isnan(x) for x in a.flatten())']
for m in ms:
    print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m

Results:

  0.11 s numpy.isnan(a).any()
  3.75 s any(numpy.isnan(x) for x in a.flatten())

Bonus: it works fine for non-array NumPy types:

>>> a = numpy.float64(42.)
>>> numpy.isnan(a).any()
False
>>> a = numpy.float64(numpy.nan)
>>> numpy.isnan(a).any()
True
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Paul
  • 42,322
  • 15
  • 106
  • 123
  • 1
    with numpy 1.7 the flatten() version is only twice as fast as the first one – Christian Geier Oct 09 '13 at 14:39
  • Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
  • 3
    @CharlieParker the same reason why float('nan') == float('nan') will return False. NaN doesn't equal NaN. Here more info: http://stackoverflow.com/questions/10034149/why-is-nan-not-equal-to-nan – Muppet Feb 06 '17 at 23:20
  • 1
    @mab: That's because calling `numpy.any` on a genexp just returns the genexp; you're not actually doing the computation you think you are. Never call `numpy.any` on a genexp. – user2357112 May 11 '17 at 20:58
  • In real debugging scenario, I would also recommend looking at [`np.isfinite`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.isfinite.html#numpy.isfinite) instead of `np.isnan` to detect numeric overflows, instability, etc. – Ben Usman May 12 '17 at 01:58
  • 2
    Out of interest -- it this really the fastest way to do this? i) Doesn't `numpy.isnan(a).any()` involve allocating a large temporary array (it is it a view). ii) If the first element is NAN, does this solution involve iterating over the full array? If I set the first element to NAN, this still takes about 5microseconds, which seems quite slow for what can be done with an array lookup and a test -- should be nanoseconds, no? – user48956 Oct 04 '17 at 18:14
28

If infinity is a possible value, I would use numpy.isfinite

numpy.isfinite(myarray).all()

If the above evaluates to True, then myarray contains none of numpy.nan, numpy.inf or -numpy.inf.

numpy.isnan will be OK with numpy.inf values, for example:

In [11]: import numpy as np

In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])

In [13]: np.isnan(b)
Out[13]: 
array([[False, False],
       [ True, False]], dtype=bool)

In [14]: np.isfinite(b)
Out[14]: 
array([[ True, False],
       [False, False]], dtype=bool)
Matthew Mage
  • 395
  • 5
  • 18
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
  • 1
    @CharlieParker because two `nan`s are not considered equal to each other. Try `float('nan') == float('nan')`. – Akavall Oct 13 '16 at 22:06
  • interesting. Why are they not considered equal? – Charlie Parker Oct 13 '16 at 22:07
  • 1
    @CharlieParker, I don't think I could give a very good answer here. Maybe this is what you are looking for: http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values – Akavall Oct 13 '16 at 22:11
14

Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Note that the accepted answer:

  • iterates over the whole data, regardless of whether a nan is found
  • creates a temporary array of size N, which is redundant.

A better solution is to return True immediately when NAN is found:

import numba
import numpy as np

NAN = float("nan")

@numba.njit(nogil=True)
def _any_nans(a):
    for x in a:
        if np.isnan(x): return True
    return False

@numba.jit
def any_nans(a):
    if not a.dtype.kind=='f': return False
    return _any_nans(a.flat)

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 573us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 774ns  (!nanoseconds)

and works for n-dimensions:

array1M_nd = array1M.reshape((len(array1M)/2, 2))
assert any_nans(array1M_nd)==True
%timeit any_nans(array1M_nd)  # 774ns

Compare this to the numpy native solution:

def any_nans(a):
    if not a.dtype.kind=='f': return False
    return np.isnan(a).any()

array1M = np.random.rand(1000000)
assert any_nans(array1M)==False
%timeit any_nans(array1M)  # 456us

array1M[0] = NAN
assert any_nans(array1M)==True
%timeit any_nans(array1M)  # 470us

%timeit np.isnan(array1M).any()  # 532us

The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.

user48956
  • 14,850
  • 19
  • 93
  • 154
3

(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

Example:

import numpy as np

A = np.array([1,2,4,np.nan])

if (np.where(np.isnan(A)))[0].shape[0]: 
    print "A contains nan"
else:
    print "A does not contain nan"
lmiguelvargasf
  • 63,191
  • 45
  • 217
  • 228
Ting On Chan
  • 121
  • 2
2

With numpy 1.3 or svn you can do this

In [1]: a = arange(10000.).reshape(100,100)

In [3]: isnan(a.max())
Out[3]: False

In [4]: a[50,50] = nan

In [5]: isnan(a.max())
Out[5]: True

In [6]: timeit isnan(a.max())
10000 loops, best of 3: 66.3 µs per loop

The treatment of nans in comparisons was not consistent in earlier versions.

  • Why doesn't something like `float('nan') in x` not work? I tried it and python returns `False` where `x = [1,2,3,float('nan')]`. – Charlie Parker Oct 13 '16 at 22:02
  • @CharlieParker ... because comparison with NAN doesn't do what you expect. NAN is treated like a logical NULL (=don't know). `float("nan")==float("nan")` give `False` (though feasibly it should probably return NAN or None). Similarly oddness with NAN and boolen NULL is true in many languages, including SQL (where NULL=NULL is never true). – user48956 Oct 04 '17 at 18:08