6

In my code, at some point I try to modify a value of a masked array, yet python seems to ignore this. I'm thinking this has to do with the way memory is stored in arrays, as if I were modifying a copy of the value and not the value itself, but I'm not well versed enough in this to have any clue how to resolve it.

Here is a simplified version of what I'm trying to do :

    x = np.zeros((2,5)) # create 2D array of zeroes
    x[0][1:3] = 5       # replace some values along 1st dimension with 5

    mask = (x[0] > 0)   # create a mask to only deal with the non negative values

    x[0][mask][1] = 10  # change one of the values that is non negative 

    print x[0][mask][1] # value isn't changed in the original array

the output of this is :

    5.0

when it should be 10.

Any help would be greatly appreciated, ideally this need to be scalable (meaning I don't necessarily know the shape of x, or where the values are non-negative, or which one I will need to modify).

I'm working with numpy 1.11.0, on python 2.7.12 on Ubuntu 16.04.2

Thanks !

Jesse Rio
  • 87
  • 1
  • 1
  • 8
  • Where possible using one set of indexing brackets, not several, e.g. `x[0, 1:3]`; `x[0, mask]`. But also keep in mind that indexing with a boolean mask produces a copy. – hpaulj May 11 '17 at 17:38

3 Answers3

4

Let's generalize your problem a bit:

In [164]: x=np.zeros((2,5))
In [165]: x[0, [1, 3]] = 5      # index with a list, not a slice
In [166]: x
Out[166]: 
array([[ 0.,  5.,  0.,  5.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

When the indexing occurs right before the =, it's part of a __setitem__ and acts on the original array. This is true whether the indexing uses slices, a list or a boolean mask.

But a selection with the list or mask produces a copy. Further indexed assignment affects only that copy, not the original.

In [167]: x[0, [1, 3]]
Out[167]: array([ 5.,  5.])
In [168]: x[0, [1, 3]][1] = 6
In [169]: x
Out[169]: 
array([[ 0.,  5.,  0.,  5.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

The best way around this is to modify the mask itself:

In [170]: x[0, np.array([1,3])[1]] = 6
In [171]: x
Out[171]: 
array([[ 0.,  5.,  0.,  6.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

If the mask is boolean, you may need to convert it to indexing array

In [174]: mask = x[0]>0
In [175]: mask
Out[175]: array([False,  True, False,  True, False], dtype=bool)
In [176]: idx = np.where(mask)[0]
In [177]: idx
Out[177]: array([1, 3], dtype=int32)
In [178]: x[0, idx[1]]
Out[178]: 6.0

Or you can tweak the boolean values directly

In [179]: mask[1]=False
In [180]: x[0,mask]
Out[180]: array([ 6.])

So in your big problem you need to be aware of when indexing produces a view and it is a copy. And you need to be comfortable with index with lists, arrays and booleans, and understand how to switch between them.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • This is perfect thank you ! This works and is scalable, I just have to rethink the way I approach masks and arrays. I wasn't aware that indexing multiple times caused python to create a copy, but this is what I suspected. – Jesse Rio May 12 '17 at 08:33
1

It's not really a masked array what you've created:

x = np.zeros((2,5))
x[0][1:3] = 5
mask = (x[0] > 0)
mask
Out[14]: array([False,  True,  True, False, False], dtype=bool)

So, this is just a boolean array. To create a masked array you should use numpy.ma module:

masked_x = np.ma.array(x[0], mask=~(x[0] > 0)) # let's mask first row as you did
masked_x
Out[15]: 
masked_array(data = [-- 5.0 5.0 -- --],
             mask = [ True False False  True  True],
       fill_value = 1e+20)

Now you can change your masked array, and accordingly the main array:

masked_x[1] = 10.    
masked_x
Out[36]: 
masked_array(data = [-- 10.0 5.0 -- --],
             mask = [ True False False  True  True],
       fill_value = 1e+20)    
x
Out[37]: 
array([[  0.,  10.,   5.,   0.,   0.],
       [  0.,   0.,   0.,   0.,   0.]])

And notice that in masked arrays invalid entries marked as True.

Vadim Shkaberda
  • 2,807
  • 19
  • 35
  • He can use this boolean array as a mask without using the `np.ma.array` step. He's probably using the term 'masked array' loosely. – hpaulj May 11 '17 at 18:13
  • @hpaulj Probably. But I believe the best way to achieve what he want (to change values of initial array 'through mask') is to use actually a masked array. At least if I correctly understand the question. – Vadim Shkaberda May 11 '17 at 18:21
  • Ideally I wanted this as simple as possible, meaning I have one array to work with and then through masks I modify this array without creating copies of the masked array (like the numpy.ma module does). @hpaulj 's answer did the trick, thanks. – Jesse Rio May 12 '17 at 08:38
  • @JesseRio You're welcome. But the good thing about masked array it's that **masked array didn't create a copy**: `masked_x.data.base is x; Out[12]: True`. It creates a view, whose memory is shared with x. – Vadim Shkaberda May 12 '17 at 16:27
1

To understand what's going on I suggest reading this http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html

This boils down to the misleading use of fancy indexing. The following statements are the same and as you can see it's directly setting to 10 the elements of x using mask.

x[0][mask] = 10
x[0,mask] = 10
x.__setitem__((0, mask), 10)

What you're doing on the other hand is the following

x[0][mask][1] = 10
x[0,mask][1] = 10
x[0,mask].__setitem__(1, 10)
x.__getitem__((0, mask)).__setitem__(1, 10)

Which is creating a copy with __getitem__()

In conclusion you need to rethink how to modify that single number with a different mask __setitem()__

Manuel
  • 270
  • 3
  • 11
  • Thanks ! This is exactly what I was suspecting, and the link you provided is very insightful. I've accepted @hpaulj 's answer since he provided a solution as well. – Jesse Rio May 12 '17 at 08:30