0

When I have a structured masked array with boolean indexing, under what conditions do I get a view and when do I get a copy? The documentation says that advanced indexing always returns a copy, but this is not true, since something like X[X>0]=42 is technically advanced indexing, but the assignment works. My situation is more complex:

I want to set the mask of a particular field based on a criterion from another field, so I need to get the field, apply the boolean indexing, and get the mask. There are 3! = 6 orders of doing so.

Preparation:

In [83]: M = ma.MaskedArray(random.random(400).view("f8,f8,f8,f8")).reshape(10, 10)

In [84]: crit = M[:, 4]["f2"] > 0.5
  1. Field - index - mask (fails):

    In [85]: M["f3"][crit, 3].mask = True
    
    In [86]: print(M["f3"][crit, 3].mask)
    [False False False False False]
    
  2. Index - field - mask (fails):

    In [87]: M[crit, 3]["f3"].mask = True
    
    In [88]: print(M[crit, 3]["f3"].mask)
    [False False False False False]
    
  3. Index - mask - field (fails):

    In [94]: M[crit, 3].mask["f3"] = True
    
    In [95]: print(M[crit, 3].mask["f3"])
    [False False False False False]
    
  4. Mask - index - field (fails):

    In [101]: M.mask[crit, 3]["f3"] = True
    
    In [102]: print(M.mask[crit, 3]["f3"])
    [False False False False False]
    
  5. Field - mask - index (succeeds):

    In [103]: M["f3"].mask[crit, 3] = True
    
    In [104]: print(M["f3"].mask[crit, 3])
    [ True  True  True  True  True]
    
    # set back to False so I can try method #6
    
    In [105]: M["f3"].mask[crit, 3] = False
    
    In [106]: print(M["f3"].mask[crit, 3])
    [False False False False False]
    
  6. Mask - field - index (succeeds):

    In [107]: M.mask["f3"][crit, 3] = True
    
    In [108]: print(M.mask["f3"][crit, 3])
    [ True  True  True  True  True]
    

So, it looks like indexing must come last.

gerrit
  • 24,025
  • 17
  • 97
  • 170
  • The duplicate, http://stackoverflow.com/questions/15691740/does-assignment-with-advanced-indexing-copy-array-data addresses the `__setitem__` v `__getitem__` issue fine. But I think there are nuances here that need further exploration - this is a structured array and masked. So there's the question of how field indexing plays with element indexing, and how the `mask` can be set. I propose reopening this. – hpaulj Jun 15 '16 at 18:51

2 Answers2

1

The issue of __setitem__ v. __getitem__ is important, but with structured array and masking it's a little harder to sort out when a __getitem__ is first making a copy.

Regarding the structured arrays, it shouldn't matter whether the field index occurs first or the element. However some releases appear to have a bug in this regard. I'll try to find a recent SO question where this was a problem.

With a masked array, there's the question of how to correctly modify the mask. The .mask is a property that accesses the underlying ._mask array. But that is fetched with __getattr__. So the simple setitem v getitem distinction does not apply directly.

Lets skip the structured bit first

In [584]: M = np.ma.MaskedArray(np.arange(4))

In [585]: M
Out[585]: 
masked_array(data = [0 1 2 3],
             mask = False,
       fill_value = 999999)

In [586]: M.mask
Out[586]: False

In [587]: M.mask[[1,2]]=True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-587-9010ee8f165e> in <module>()
----> 1 M.mask[[1,2]]=True

TypeError: 'numpy.bool_' object does not support item assignment

Initially mask is a scalar boolean, not an array.

This works

In [588]: M.mask=np.zeros((4,),bool)  # change mask to array

In [589]: M
Out[589]: 
masked_array(data = [0 1 2 3],
             mask = [False False False False],
       fill_value = 999999)

In [590]: M.mask[[1,2]]=True

In [591]: M
Out[591]: 
masked_array(data = [0 -- -- 3],
             mask = [False  True  True False],
       fill_value = 999999)

This does not

In [592]: M[[1,2]].mask=True

In [593]: M
Out[593]: 
masked_array(data = [0 -- -- 3],
             mask = [False  True  True False],
       fill_value = 999999)

M[[1,2]] is evidently the copy, and the assignment is to its mask attribute, not M.mask.

....

A masked array has .__setmask__ method. You can study that in np.ma.core.py. And the mask property is defined with

mask = property(fget=_get_mask, fset=__setmask__, doc="Mask")

So M.mask=... does use this.

So it looks like the problem case is doing

M.__getitem__(index).__setmask__(values)

hence the copy. The M.mask[]=... is doing

M._mask.__setitem__(index, values)

since _getmask just does return self._mask.


M["f3"].mask[crit, 3] = True

works because M['f3'] is a view. (M[['f1','f3']] is ok for get, but doesn't work for setting).

M.mask["f3"] is also a view. I'm not entirely sure of the order the relevant get and sets. __setmask__ has code that deals specifically with compound dtype (structured).

=========================

Looking at a structured array, without the masking complication, the indexing order matters

In [607]: M1 = np.arange(16).view("i,i")

In [609]: M1[[3,4]]['f1']=[3,4]          # no change   
In [610]: M1[[3,4]]['f1']
Out[610]: array([7, 9], dtype=int32)

In [611]: M1['f1'][[3,4]]=[1,2]    # change
In [612]: M1
Out[612]: 
array([(0, 1), (2, 3), (4, 5), (6, 1), (8, 2), (10, 11), (12, 13), (14, 15)], dtype=[('f0', '<i4'), ('f1', '<i4')])

So we still have a __getitem__ followed by a __setitem__, and we have to pay attention as to whether the get returns a view or a copy.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

This is because although advanced indexing returns a copy, assigning to advanced indexing still works. Only the method where advanced indexing is the last operation is assigning to advanced indexing (through __setitem__).

gerrit
  • 24,025
  • 17
  • 97
  • 170