Masked `np.nan` in the `np.ma.array` problem in jupyter

Question

Let's run in the Anaconda Jupyter the Python3 NumPy code:

y = np.ma.array(np.matrix([[np.nan, 2.0]]), mask=[0, 1])
m = (y < 0.01)

and we have the warning: /.../anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in less.

Substituting np.nan with 1.0 etc. --- no warning.

Why the np.nan can not be masked and then compared?

Take a look at https://stackoverflow.com/questions/25345843/inequality-comparison-of-numpy-array-with-nan-to-a-scalar — Artur Kasza, Feb 21 '20 at 11:34
But for `m = (y != 0.01)` (or `==`) the `y[m]` is valid; for `m = (y[~np.isnan(y)] < 0.01)` is not valid --- the shape is changed. — zeta0, Feb 21 '20 at 12:09
With `y = np.ma.array(np.matrix([[np.nan, 2.0]]), mask=[1, 0])` and masking `m = (y < 0.01)` we have `RuntimeWarning: invalid value encountered in less`. Changing to `m = (y != 0.01)` and adding `print(y, y.shape) ; print(y[m], y[m].shape)` we get expected: `[[-- 2.0]] (1, 2)` and `[[-- 2.0]] (1, 2)`. But `m = np.isnan(y) ; m &= (y[~m] < 0.01)` gives the `print(y, y.shape) ; print(y[m], y[m].shape)` unexpected: `[[-- 2.0]] (1, 2)` and `[] (1, 0)` -- shape changed. — zeta0, Feb 21 '20 at 14:43

score 1 · Answer 1 · answered Feb 21 '20 at 17:46

MA has several strategies to implementing methods.

1) evaluate the method on y.data, and make a new ma with y.mask. It may suppress any runtime warnings.

2) evaluate the method on y.filled() # with the default fill value

3) evaluate the method on y.filled(1) # or some other innocuous value

4) evaluate the method on y.compressed()

5) evaluate the method on y.data[~y.mask]

multiplication, for example use filled(1), and addition uses filled(0).

It appears that the comparisons are done with 1).

I haven't studied the ma code in detail, but I don't think it does 5).

If you are using ma just to avoid the runtime warning, there are some alternatives.

there's a collection of np.nan... functions that filter out nan before calculating
there are ways of surpressing runtime warnings
ufuncs have a where parameter that can be used to skip some elements. Use it with an out parameter to define the skipped ones.

===

Looking a np.ma.core.py I see functions like ma.less.

In [857]: y = np.ma.array([np.nan, 0.0, 2.0], mask=[1, 0, 0])                                  
In [858]: y >1.0                                                                               
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
  #!/usr/bin/python3
Out[858]: 
masked_array(data=[--, False, True],
             mask=[ True, False, False],
       fill_value=True)
In [859]: np.ma.greater(y,1.0)                                                                 
Out[859]: 
masked_array(data=[--, False, True],
             mask=[ True, False, False],
       fill_value=True)

Looking at the code, ma.less and such are a MaskedBinaryOperation class, and use 1) - evaluate on the data with

np.seterr(divide='ignore', invalid='ignore')

The result mask is logical combination of the arguments' masks.

https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html#operations-on-masked-arrays

score 0 · Answer 2 · answered Feb 21 '20 at 15:42

Making the issue more simple, let's assume:

y = np.ma.array([np.nan, 0.0, 2.0], mask=[1, 0, 0])
m = (y > 1.0)
print(y, y.shape) ; print(y[m], y[m].shape, m.shape)

and the output is:

[-- 0.0 2.0] (3,)
[2.0] (1,) (3,)

with the RuntimeWarning: /.../anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in greater.

Changing:

...
m = (y != 2.0)
...

We get:

[-- 0.0 2.0] (3,)
[-- 0.0] (2,) (3,)

so we have a masked element and the result without any RuntimeWarning.

Changing now:

...
m = y.mask.copy() ; y[np.isnan(y)] = 9.0 ; y.mask = m ; m = (y > 1.0)
...

We get (without RuntimeWorning):

[-- 0.0 2.0] (3,)
[-- 2.0] (2,) (3,)

This work-around is however strange (by setting arbitrary value in the place of np.nan and mask saving). Comparing something with masked should be always masked, shouldn't it?

`np.ma` has functions like `less` and `greater` to properly handle the masking. The operator overloads don't handle that. — hpaulj, Feb 22 '20 at 16:09

Masked `np.nan` in the `np.ma.array` problem in jupyter

2 Answers2