3

I have a 2D array (for this example, actually can be ND), and I would like to create a mask for it that masks the end of each row. For example:

np.random.seed(0xBEEF)
a = np.random.randint(10, size=(5, 6))
mask_indices = np.argmax(a, axis=1)

I would like to convert mask_indices to a boolean mask. Currently, I can't think of a better way than

mask = np.zeros(a.shape, dtype=np.bool)
for r, m in enumerate(mask_indices):
    mask[r, m:] = True

So for

a = np.array([[6, 5, 0, 2, 1, 2],
              [8, 1, 3, 7, 1, 9],
              [8, 7, 6, 7, 3, 6],
              [2, 7, 0, 3, 1, 7],
              [5, 4, 0, 7, 6, 0]])

and

mask_indices = np.array([0, 5, 0, 1, 3])

I would like to see

mask = np.array([[ True,  True,  True,  True,  True,  True],
                 [False, False, False, False, False,  True],
                 [ True,  True,  True,  True,  True,  True],
                 [False,  True,  True,  True,  True,  True],
                 [False, False, False,  True,  True,  True]])

Is there a vectorized form of this operation?

In general, I would like to be able to do this across all the dimensions besides the one that defines the index points.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • 2
    `np.arange(6)>=mask_indices[:,None]`? – Paul Panzer Oct 28 '19 at 17:44
  • 1
    @PaulPanzer what kind of magic is this? This should be an answer. I don't understand the `None` part or how it generated this shape. Edit: `None` is equivalent to `newaxis`, answered here: https://stackoverflow.com/questions/1408311/numpy-array-slice-using-none – r.ook Oct 28 '19 at 17:54
  • 2
    @r.ook numpy broadcasting. `mask_indices[..., None].shape` is `(..., 1)`, thus will match any dimension, including `np.arange` result. It is then broadcasted into `mask_indices`'s shape. Beautiful trick – Marat Oct 28 '19 at 18:01

2 Answers2

4

I. Ndim array-masking along last axis (rows)

For n-dim array to mask along rows, we could do -

def mask_from_start_indices(a, mask_indices):
    r = np.arange(a.shape[-1])
    return mask_indices[...,None]<=r

Sample run -

In [177]: np.random.seed(0)
     ...: a = np.random.randint(10, size=(2, 2, 5))
     ...: mask_indices = np.argmax(a, axis=-1)

In [178]: a
Out[178]: 
array([[[5, 0, 3, 3, 7],
        [9, 3, 5, 2, 4]],

       [[7, 6, 8, 8, 1],
        [6, 7, 7, 8, 1]]])

In [179]: mask_indices
Out[179]: 
array([[4, 0],
       [2, 3]])

In [180]: mask_from_start_indices(a, mask_indices)
Out[180]: 
array([[[False, False, False, False,  True],
        [ True,  True,  True,  True,  True]],

       [[False, False,  True,  True,  True],
        [False, False, False,  True,  True]]])

II. Ndim array-masking along generic axis

For n-dim arrays masking along a generic axis, it would be -

def mask_from_start_indices_genericaxis(a, mask_indices, axis):
    r = np.arange(a.shape[axis]).reshape((-1,)+(1,)*(a.ndim-axis-1))
    mask_indices_nd = mask_indices.reshape(np.insert(mask_indices.shape,axis,1))
    return mask_indices_nd<=r

Sample runs -

Data array setup :

In [288]: np.random.seed(0)
     ...: a = np.random.randint(10, size=(2, 3, 5))

In [289]: a
Out[289]: 
array([[[5, 0, 3, 3, 7],
        [9, 3, 5, 2, 4],
        [7, 6, 8, 8, 1]],

       [[6, 7, 7, 8, 1],
        [5, 9, 8, 9, 4],
        [3, 0, 3, 5, 0]]])

Indices setup and masking along axis=1 -

In [290]: mask_indices = np.argmax(a, axis=1)

In [291]: mask_indices
Out[291]: 
array([[1, 2, 2, 2, 0],
       [0, 1, 1, 1, 1]])

In [292]: mask_from_start_indices_genericaxis(a, mask_indices, axis=1)
Out[292]: 
array([[[False, False, False, False,  True],
        [ True, False, False, False,  True],
        [ True,  True,  True,  True,  True]],

       [[ True, False, False, False, False],
        [ True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True]]])

Indices setup and masking along axis=2 -

In [293]: mask_indices = np.argmax(a, axis=2)

In [294]: mask_indices
Out[294]: 
array([[4, 0, 2],
       [3, 1, 3]])

In [295]: mask_from_start_indices_genericaxis(a, mask_indices, axis=2)
Out[295]: 
array([[[False, False, False, False,  True],
        [ True,  True,  True,  True,  True],
        [False, False,  True,  True,  True]],

       [[False, False, False,  True,  True],
        [False,  True,  True,  True,  True],
        [False, False, False,  True,  True]]])

Other scenarios

A. Extending to given end/stop-indices for masking

To extend the solutions for cases when we are given end/stop-indices for masking, i.e. we are looking to vectorize mask[r, :m] = True, we just need to edit the last step of comparison in the posted solutions to the following -

return mask_indices_nd>r

B. Outputting an integer array

There might be cases when we might be looking to get an int array. On those, simply view the output as such. Hence, if out is the output off the posted solutions, then we can simply do out.view('i1') or out.view('u1') for int8 and uint8 dtype outputs respectively.

For other datatypes, we would need to use .astype() for dtype conversions.

C. For index-inclusive masking for stop-indices

For index-inclusive masking, i.e. the index is to be included for stop-indices case, we need to simply include the equality in the comparison. Hence, the last step would be -

return mask_indices_nd>=r

D. For index-exclusive masking for start-indices

This is a case when the start indices are given and those indices are not be masked, but masked only from the next element onwards until end. So, similar to the reasoning listed in previous section, for this case we would have the last step modified to -

return mask_indices_nd<r
Divakar
  • 218,885
  • 19
  • 262
  • 358
3
>>> az = np.zeros(a.shape)
>>> az[np.arange(az.shape[0]), mask_indices] = 1
>>> az.cumsum(axis=1).astype(bool)  # use n-th dimension for nd case
array([[ True,  True,  True,  True,  True,  True],
       [False, False, False, False, False,  True],
       [ True,  True,  True,  True,  True,  True],
       [False,  True,  True,  True,  True,  True],
       [False, False, False,  True,  True,  True]])
Marat
  • 15,215
  • 2
  • 39
  • 48