Numpy boolean indexing with multiple dimensions. Why won't it select rows and columns?

Question

I have an ndarray with n>1 dimensions. I have a boolean array ok0 corresponding to the rows I want to select, and another boolean array ok1 corresponding to the columns I want to select. I want to include all "pages". So I try Z[ok0, ok1, :], where ok0 is a 1-D boolean array with ok0.size == Z.shape[0], and ok1 is a boolean array with ok1.size == Z.shape[1]. Is there a way to use these boolean arrays directly to index my nd-array?

A code fragment paints a thousand words.

In [50]: Z = arange(7*8*9).reshape(7, 8, 9)

In [51]: ok0 = Z.sum(1).sum(1)%10<3

In [52]: ok1 = Z.sum(0).sum(1)%10<5

In [53]: ok0.shape
Out[53]: (7,)

In [54]: ok1.shape
Out[54]: (8,)

In [55]: Z[ok0, :, :].shape
Out[55]: (3, 8, 9)

In [56]: Z[:, ok1, :].shape
Out[56]: (7, 4, 9)

In [57]: Z[ok0, ok1, :].shape
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-ebba5b9a19dd> in <module>()
----> 1 Z[ok0, ok1, :].shape

ValueError: shape mismatch: objects cannot be broadcast to a single shape

The desired effect can be achieved indirectly as follows:

In [58]: Z[ok0, :, :][:, ok1, :].shape
Out[58]: (3, 4, 9)

If I convert ok0 and ok1 from boolean arrays into integer arrays, I can use the solution provided in this answer to Selecting specific rows and columns from NumPy array:

In [88]: ok0i = ok0.nonzero()[0]

In [89]: ok1i = ok1.nonzero()[0]

In [90]: Z[ok0i[:, newaxis], ok1i, :].shape
Out[90]: (3, 4, 9)

However, this does not work with the original boolean arrays:

In [87]: Z[ok0[:, newaxis], ok1, :].shape
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-87-7e9fa28c47fa> in <module>()
----> 1 Z[ok0[:, newaxis], ok1, :].shape

ValueError: shape mismatch: objects cannot be broadcast to a single shape

Why does this not work — what's going wrong here? And (how) can I achieve the desired effect in one go, without repeating my full indexing (which could be potentially long) as I did in command 58?

pseudocubic · Answer 1 · 2014-10-08T15:20:14.883

Since you want to select whole rows and columns based on some condition, I think that np.take might be an appropriate solution to this problem without having to change the existing method to determining the rows and columns you want, ok0 and ok1.

result = np.take(np.take(Z, np.where(ok0)[0], axis=0), np.where(ok1)[0], axis=1)

This would first select all of the rows (axis=0) where ok0==True, and from that subset, select all of the columns (axis=1) where ok1==True. You need the [0] after the np.where since np.where outputs a tuple of array(s) (array([]),) containing indices, but you just want the array of indices for np.take.

The added advantage to this method is that np.take is also much more efficient than using "fancy" indexing of ndarrays.

`np.where(ok0)[0]` is the same as `ok0.nonzero()[0]`, isn't it? Although this method works, it doesn't appear more readable than the alternatives I have found that worked (in my present application, efficiency is not an issue as other operations nearby take two orders of magnitude longer). — gerrit, Oct 08 '14 at 15:18

score 0 · Answer 2 · answered Oct 09 '14 at 18:13

Solution

Re: do mask selection in one stage:

In [152]: result = Z[ok0[:, np.newaxis] & ok1].reshape(ok0.sum(), ok1.sum(),
                                                       *Z.shape[2:])

In [153]: result.shape
Out[153]: (3, 4, 9)

In [154]: (result == Z[ok0][:, ok1]).all()
Out[154]: True

Re: long indexing: you can omit any number of trailing dimensions, you can replace any number of leading dimensions with ellipsis (...) provided you specify all the last dimensions in full.

In [155]: Z[0]
Out[155]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53],
       [54, 55, 56, 57, 58, 59, 60, 61, 62],
       [63, 64, 65, 66, 67, 68, 69, 70, 71]])

In [156]: Z[...,0]
Out[156]: 
array([[  0,   9,  18,  27,  36,  45,  54,  63],
       [ 72,  81,  90,  99, 108, 117, 126, 135],
       [144, 153, 162, 171, 180, 189, 198, 207],
       [216, 225, 234, 243, 252, 261, 270, 279],
       [288, 297, 306, 315, 324, 333, 342, 351],
       [360, 369, 378, 387, 396, 405, 414, 423],
       [432, 441, 450, 459, 468, 477, 486, 495]])

Description

Mask selection works because we can use high-dim boolean masks to get the elements that fit the condition:

In [157]: arr
Out[157]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [158]: (arr % 2 == 0).astype(int)
Out[158]: 
array([[1, 0, 1],
       [0, 1, 0],
       [1, 0, 1]])

In [159]: arr[arr % 2 == 0]
Out[159]: array([0, 2, 4, 6, 8])

The mask can be generated with broadcasting trick you've used:

In [160]: ok0 = arr.sum(1)%10<3

In [161]: ok1 = arr.sum(0)%10<5

In [162]: (ok0[:, np.newaxis] & ok1).astype(int)
Out[162]: 
array([[0, 0, 0],
       [0, 1, 0],
       [0, 1, 0]])

In [163]: arr[ok0[:, np.newaxis] & ok1]
Out[163]: array([4, 7])

But you can notice that the elements are ravelled, count ones in each mask to restore the shape:

In [164]: arr[ok0[:, np.newaxis] & ok1].reshape(ok0.sum(), ok1.sum())
Out[164]: 
array([[4],
       [7]])

Numpy boolean indexing with multiple dimensions. Why won't it select rows and columns?

2 Answers2

Solution

Description