8

I want to apply boolean masking both to rows and columns.

With

X = np.array([[1,2,3],[4,5,6]])
mask1 = np.array([True, True])
mask2 = np.array([True, True, False])
X[mask1, mask2]

I expect the output to be

array([[1,2],[4,5]])

instead of

array([1,5])

It's known that

X[:, mask2]

can be used here but that's not a solution for the general case.

I would like to know how it works under the hood and why in this case the result is array([1,5]).

mr.tarsa
  • 6,386
  • 3
  • 25
  • 42
  • Advanced indexing doesn't work the way you think it does. See http://stackoverflow.com/questions/30609734/numpy-ndarray-advanced-indexing/30609884#30609884 for a mostly-dupe, except with integer arrays instead of boolean arrays. – user2357112 Feb 18 '17 at 00:12
  • Also see the [indexing documentation](https://docs.scipy.org/doc/numpy/user/basics.indexing.html) for the full details of how NumPy indexing works (minus a few weird, undocumented cases mostly retained for backward compatibility). – user2357112 Feb 18 '17 at 00:14

4 Answers4

6

X[mask1, mask2] is described in Boolean Array Indexing Doc as the equivalent of

In [249]: X[mask1.nonzero()[0], mask2.nonzero()[0]]
Out[249]: array([1, 5])
In [250]: X[[0,1], [0,1]]
Out[250]: array([1, 5])

In effect it is giving you X[0,0] and X[1,1] (pairing the 0s and 1s).

What you want instead is:

In [251]: X[[[0],[1]], [0,1]]
Out[251]: 
array([[1, 2],
       [4, 5]])

np.ix_ is a handy tool for creating the right mix of dimensions

In [258]: np.ix_([0,1],[0,1])
Out[258]: 
(array([[0],
        [1]]), array([[0, 1]]))
In [259]: X[np.ix_([0,1],[0,1])]
Out[259]: 
array([[1, 2],
       [4, 5]])

That's effectively a column vector for the 1st axis and row vector for the second, together defining the desired rectangle of values.

But trying to broadcast boolean arrays like this does not work: X[mask1[:,None], mask2]

But that reference section says:

Combining multiple Boolean indexing arrays or a Boolean with an integer indexing array can best be understood with the obj.nonzero() analogy. The function ix_ also supports boolean arrays and will work without any surprises.

In [260]: X[np.ix_(mask1, mask2)]
Out[260]: 
array([[1, 2],
       [4, 5]])
In [261]: np.ix_(mask1, mask2)
Out[261]: 
(array([[0],
        [1]], dtype=int32), array([[0, 1]], dtype=int32))

The boolean section of ix_:

    if issubdtype(new.dtype, _nx.bool_):
        new, = new.nonzero()

So it works with a mix like X[np.ix_(mask1, [0,2])]

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • ah, `ix_` also works for boolean arrays. Good to know – MSeifert Feb 18 '17 at 00:56
  • It's new to me too. – hpaulj Feb 18 '17 at 00:59
  • Huh, that works? Weird. It's not mentioned in the docs for [`np.ix_` itself](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html), and applying the behavior as described in the `np.ix_` docs would produce an entirely different result. – user2357112 Feb 18 '17 at 01:06
1

One solution would be to use sequential integer indexing and getting the integers for example from np.where:

>>> X[:, np.where(mask1)[0]][np.where(mask2)[0]]
array([[1, 2],
       [4, 5]])

or as @user2357112 pointed out in the comments np.ix_ could be used as well. For example:

>>> X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]
array([[1, 2],
       [4, 5]])

Another idea would be to broadcast your masks and then do it in one step would require a reshape afterwards:

>>> X[np.where(mask1[:, None] * mask2)]
array([1, 2, 4, 5])

>>> X[np.where(mask1[:, None] * mask2)].reshape(2, 2)
array([[1, 2],
       [4, 5]])
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • `np.ix_` helps, if you convert from boolean to integer index arrays first. I've always wished `np.ix_` was more flexible. It could do a lot more than it currently does. – user2357112 Feb 18 '17 at 00:25
  • Yeah, I was thinking about `X[np.ix_(np.where(mask1)[0], np.where(mask2)[0])]` but I found that a bit hard to digest. :) – MSeifert Feb 18 '17 at 00:27
  • 1
    Thanks, I've grasped smth new for myself. It even seems possible to just do `X[mask1][:,mask2]` and it looks like an easy way for me, but I've heard that `][` is not the way to go. – mr.tarsa Feb 18 '17 at 00:34
  • 1
    @tarashypka: It's usually a bad idea, but I'd say it's justified here. Comma-indexing instead of double-bracket-indexing is usually recommended because there are a number of cases where double-bracket-indexing will surprise newbies by doing the wrong thing, but if double-bracket-indexing solves your problem in a simpler way than comma-indexing can, go for it. – user2357112 Feb 18 '17 at 00:45
0

In a more general sense, your question is bout finding the subpart of an array containing certain rows and columns.

main_array = np.array([[1,2,3],[4,5,6]])
mask_ax_0 = np.array([True, True]) # about which rows, i want
mask_ax_1 = np.array([True, True, False]) # which columns, i want

Answer:

mask_2d = np.logical_and(mask_ax_0.reshape(-1,1), mask_ax_1.reshape(1,-1))
sub_array = main_array[mask_2d].reshape(np.sum(mask_ax_0), np.sum(mask_ax_1))
print(sub_array)
Rajesh Nakka
  • 123
  • 1
  • 7
-2

You should be using the numpy.ma module. In particular, you could use mask_rowcols :

X = np.array([[1,2,3],[4,5,6]])
linesmask = np.array([True, True])
colsmask = np.array([True, True, False])

X = X.view(ma.MaskedArray)
for i in range(len(linesmask)):
    X.mask[i][0] = not linemask[i]
for j in range(len(colsmask)):
    X.mask[0][j] = not colsmask[j]
X = ma.mask_rowcols(X)
SRLKilling
  • 42
  • 7
  • `masking` in numpy has two usages. One is indexing with a boolean 'mask'. The other is constructing a `np.ma` Masked Array., as you do. Indexing is more common. – hpaulj Feb 18 '17 at 01:41