np.where and masked array

Question

I'm working with masked arrays thanks to some of the help I've gotten on stackoverflow, but I'm running into a problem with the np.where evaluation of a masked array.

My masked array is:

m_pt0 = np.ma.masked_array([1, 2, 3, 0, 4, 7, 6, 5],
                           mask=[False, True, False, False,
                                 False, False, False, False])

And prints like this:

In [24]: print(m_pt0)
[1 -- 3 0 4 7 6 5]

And I'm looking for the index in m_pt0 where m_pt0 = 0, I would expect that

np.where(0 == m_pt0)

would return:

(array([3]))

However, despite the mask (or because of?), I instead get

(array([1, 3]),)

The entire point of using the mask is to avoid accessing indices that are "blank", so how can I use where (or another function) to only retrieve the indices that are unmasked and match my boolean criteria.

Did you try `ma.where()` from the `numpy.ma` submodule instead? Does it yield a different result? — blubberdiblub, May 03 '17 at 13:27
Please show your full code. In an example that I put together everything works as expected. — Tom de Geus, May 03 '17 at 13:29
`[1 -- 3 0 4 7 6 5]` is not a valid numpy object. What's the real object that you're referring to? — Mazdak, May 03 '17 at 13:32
@blubberdiblub has the same syntax that I'm using, sorry for the confusion. — stagermane, May 03 '17 at 13:42
As the documentations is stated, the package (masked array) ensures that masked entries are not used in computations. This is while the `np.where()` performs a comparison not a computation. Try with another mask that filters some of your positive numbers then perform a function like `np.sum()`. — Mazdak, May 03 '17 at 13:51
Masked array methods handle the computations correctly. So do `np.ma` functions. General `numpy` functions work if they delegate the work to the method. Otherwise they will work with the unmasked `.data` attribute, and probably will be wrong. — hpaulj, May 03 '17 at 17:16

score 8 · Accepted Answer · answered May 03 '17 at 13:50

8

You need to use the masked variant of the where() function, otherwise it will return wrong or unwanted results for masked arrays. The same goes for other functions, like polyfit().

I. e.:

In [2]: np.ma.where(0 == m_pt0)
Out[2]: (array([3]),)

answered May 03 '17 at 13:50

blubberdiblub

4,085
1
28
30

This is the solution but read [my comment](http://stackoverflow.com/questions/43761425/np-where-and-masked-array?noredirect=1#comment74565203_43761425) for reason of such behavior. – Mazdak May 03 '17 at 13:56
@Kasramvd I don't think it's as clear-cut as that. `np.polyfit()` also doesn't work properly for masked arrays and that one does a computation, not a comparison. – blubberdiblub May 03 '17 at 14:01
Yes it does some computations. Although it still has some comparisons in it (with Degree of the fitting polynomial) but this does't make any difference I think this is kind of a bug in documentation. – Mazdak May 03 '17 at 14:05

hpaulj · Answer 2 · 2017-05-03T16:33:01.153

The equality test may create confusion. The result is another masked array:

In [19]: 0 == m_pt0
Out[19]: 
masked_array(data = [False -- False True False False False False],
             mask = [False  True False False False False False False],
       fill_value = True)

A masked array has .data and .mask attributes. numpy functions that aren't MA aware just see the .data:

In [20]: _.data
Out[20]: array([False,  True, False,  True, False, False, False, False], dtype=bool)

np.where sees the 2 True, and returns

In [23]: np.where(0 == m_pt0)
Out[23]: (array([1, 3], dtype=int32),)
In [24]: np.where((0 == m_pt0).data)
Out[24]: (array([1, 3], dtype=int32),)

Where possible it is better to use the np.ma version of a function:

In [25]: np.ma.where(0 == m_pt0)
Out[25]: (array([3], dtype=int32),)

Looking at the code for np.source(np.ma.where) I see it does

if missing == 2:
    return filled(condition, 0).nonzero()
(plus lots of code for the 3 argument use)

That filled does:

In [27]: np.ma.filled((0 == m_pt0),0)
Out[27]: array([False, False, False,  True, False, False, False, False], dtype=bool)

MA functions often replace the masked values with something innocuous (0 in this case), or use compressed to remove them from consideration.

In [36]: m_pt0.compressed()
Out[36]: array([1, 3, 0, 4, 7, 6, 5])
In [37]: m_pt0.filled(100)
Out[37]: array([  1, 100,   3,   0,   4,   7,   6,   5])

A numpy function will work correctly on a MA if it delegates the work to the array's own method.

In [41]: np.nonzero(m_pt0)
Out[41]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [42]: m_pt0.nonzero()
Out[42]: (array([0, 2, 4, 5, 6, 7], dtype=int32),)
In [43]: np.where(m_pt0)
Out[43]: (array([0, 1, 2, 4, 5, 6, 7], dtype=int32),)

np.nonzero delegates. np.where does not.

The repr of a masked array shows the mask. Its str just shows the masked data:

In [31]: m_pt0
Out[31]: 
masked_array(data = [1 -- 3 0 4 7 6 5],
             mask = [False  True False False False False False False],
       fill_value = 999999)
In [32]: str(m_pt0)
Out[32]: '[1 -- 3 0 4 7 6 5]'

np.where and masked array

2 Answers2