0

I tried to use Numpy's nanmax function to get the max of all non-nan values in a matrix's column, for some it works, for some it returns nan as the maximum. However, there are non-nan values in every column and just to be sure I tried the same thing in R with max(x, na.rm = T) and everything is fine there.

Anyone has any ideas of why this occurs? The only thing I can think of is that I converted the numpy matrix from a pandas frame but I really have no clue...

np.nanmax(datamatrix, axis=0)

matrix([[1, 101, 193, 1, 163.0, 10.6, nan, 4.7, 142.0, 0.47, 595.0,
         170.0, 5.73, 24.0, 27.0, 23.0, 361.0, 33.0, 94.0, 9.2, 16.8, nan,
         nan, 91.0, nan, nan, nan, nan, 0.0, 105.0, nan, nan, nan, nan,nan,
         nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
         nan, nan, nan, nan, nan, nan, nan, nan]], dtype=object)
Eric
  • 95,302
  • 53
  • 242
  • 374
meow
  • 2,062
  • 2
  • 17
  • 27
  • 2
    show a sample matrix, and the results. – Daniel Aug 18 '16 at 18:10
  • 2
    Unless you post both a minimal matrix that shows your result and the code that is causing the problem, there is nothing much we can do. This question is extremely vague as it stands. – Mad Physicist Aug 18 '16 at 18:10
  • This thread nicely illustrates the examples requested by the first two commenters: http://stackoverflow.com/help/mcve – andrew Aug 18 '16 at 18:16
  • `nanmax()` might not correctly handle an object array. If `x` is the array, check `x.dtype`. Pandas uses object arrays, so your array might also be an object array if you converted from a Pandas DataFrame. – Warren Weckesser Aug 18 '16 at 18:27
  • For example, `np.nanmax(np.array([2.0, np.nan, np.nan]))` returns 2.0 as expected, but `np.nanmax(np.array([2.0, np.nan, np.nan], dtype=object))` generates a warning and returns `nan`. – Warren Weckesser Aug 18 '16 at 18:31
  • Sorry for the vague description. The object array is the problem. Thank you very much. – meow Aug 18 '16 at 18:49
  • Thanks for the update. It confirms my suspicion that your array is an object array. – Warren Weckesser Aug 18 '16 at 18:57

1 Answers1

3

Your array is an object array, meaning the elements in the array are arbitrary python objects. Pandas uses object arrays, so it is likely that when you converted your Pandas DataFrame to a numpy array, the result was an object array. nanmax() doesn't handle object arrays correctly.

Here are a couple examples, one using a numpy.matrix and one a numpy.ndarray. With a matrix, you get no warning at all the something went wrong:

In [1]: import numpy as np

In [2]: m = np.matrix([[2.0, np.nan, np.nan]], dtype=object)

In [3]: np.nanmax(m)
Out[3]: nan

With an array, you get a cryptic warning, but nan is still returned:

In [4]: a = np.array([[2.0, np.nan, np.nan]], dtype=object)

In [5]: np.nanmax(a)
/Users/warren/miniconda3scipy/lib/python3.5/site-packages/numpy/lib/nanfunctions.py:326: RuntimeWarning: All-NaN slice encountered
  warnings.warn("All-NaN slice encountered", RuntimeWarning)
Out[5]: nan

You can determine if your array is an object array in a few ways. When you display the array in an interactive python or ipython shell, you'll see dtype=object. Or you can check a.dtype; if a is an object array, you'll see either dtype('O') or object (depending on whether you end up seeing the str() or repr() of the dtype).

Assuming all the values in the array are, in fact, floating point values, a way to work around this is to first convert from the object array to an array of floating point values:

In [6]: b = a.astype(np.float64)

In [7]: b
Out[7]: array([[  2.,  nan,  nan]])

In [8]: np.nanmax(b)
Out[8]: 2.0

In [9]: n = m.astype(np.float64)

In [10]: np.nanmax(n)
Out[10]: 2.0
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214