Numpy extract arbitrary subarray along one axis

Question

For example, I have a 3-d array:

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]

 [[24 25 26 27]
  [28 29 30 31]
  [32 33 34 35]]]

and the final array I want:

[[[ 0  1]
  [ 4  5]]

 [[18 19]
  [22 23]]

 [[26 27]
  [30 31]]]

Is there any efficient way to get array like that without using for loop ?

The reason to ask this question is that if we want to get arbitrary elements alone one axis, like:

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

I can get array [ 2 7 9] using a[np.arange(a.shape[0]), [2, 3, 1]] So I wonder if there is a similar way when elements become subarrays.

It is not a slice. In slice you sort-of cut through the whole array. Here you glued different parts of the array together. — zegkljan, Apr 17 '16 at 05:55
@zegkljan: Thank you for pointing out. I have already changed my expression. — Jeff Dong, Apr 17 '16 at 06:26

B. M. · Accepted Answer · 2016-04-17T17:00:46.580

A simple idea would be a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]] , but it is not implemented.

A workaround can be obtained with np.lib.stride_tricks.as_strided. Just define :

ast=np.lib.stride_tricks.as_strided(a,a.shape*2,a.strides*2)
#ast.shape is (3, 3, 4, 3, 3, 4).

Then you can define separately origin and size of blocs:

In [4]: ast[[0,1,2],[0,1,0],[0,2,2],0,:2,:2]
Out[4]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

Some explanations :

origin :

You want to find the blocs beginning with elements 0,18,26 .

their indices in the reshaped array can be found by :

In [316]: np.unravel_index([0,18,26],a.shape)
Out[316]: 
(array([0, 1, 2], dtype=int64),
 array([0, 1, 0], dtype=int64),
 array([0, 2, 2], dtype=int64))

ast[[0,1,2],[0,1,0],[0,2,2]] is a (3,3,3,4) array. each (3,3,4) array begin with a selected element.

array([[[[          0,           1,           2,           3],
         [          4,           5,           6,           7],
         [          8,           9,          10,          11]],

        [[         12,          13,          14,          15],
         [         16,          17,          18,          19],
         [         20,          21,          22,          23]],

        [[         24,          25,          26,          27],
         [         28,          29,          30,          31],
         [         32,          33,          34,          35]]],


       [[[         18,          19,          20,          21],
         [         22,          23,          24,          25],
         [         26,          27,          28,          29]],

        [[         30,          31,          32,          33],
         [         34,          35,    23592960,       18335],
         [  697780028, -2147480064,   540876865,  1630433390]],

        [[ 2036429426,   538970664,   538976288,  1532698656],
         [  741355058,   808334368,   775168044,   874523696],
         [  744304686,   538976266,   538976288,   811278368]]],


       [[[         26,          27,          28,          29],
         [         30,          31,          32,          33],
         [         34,          35,    23592960,       18335]],

        [[  697780028, -2147480064,   540876865,  1630433390],
         [ 2036429426,   538970664,   538976288,  1532698656],
         [  741355058,   808334368,   775168044,   874523696]],

        [[  744304686,   538976266,   538976288,   811278368],
         [  539766830,   741355058,   808333600,   775036972],
         [  170679600,   538976288,   538976288,   774920992]]]])

As said in the documentation, as_strided is a dangerous hack, and must be used with care, since it can access elements that are not in the array if badly used. Next step will ensure the selection of valid elements.

size :

The interesting elements are the four in the upper left corner in each first bloc. So ast[[0,1,2],[0,1,0],[0,2,2],0,:2,:2] select them.

You can also define sized blocs like that: bloc122=ast[...,0,:2,:2] (bloc122.shape is (3, 3, 4, 2, 2) ) :

In [8]: bloc122[[0,1,2],[0,1,0],[0,2,2]]=0

In [9]: a
Out[9]: 
array([[[ 0,  0,  2,  3],
        [ 0,  0,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17,  0,  0],
        [20, 21,  0,  0]],

       [[24, 25,  0,  0],
        [28, 29,  0,  0],
        [32, 33, 34, 35]]])

This `as_strided` works, but is hard to understand, and looks dangerous. Displaying the whole `ast` array shows a lot of junk values outside of the original `a.data` buffer. — hpaulj, Apr 17 '16 at 16:15
Yes I add some explanations. As `ix_` , `as_strided` needs some experience to be understood ;). — B. M., Apr 17 '16 at 16:41
wrapping the as_strided statement such that it always provides valid views of blocks, by setting one of the halves of shape equal to the block size, and the other to shape minus block size, would be a good fix to this — Eelco Hoogendoorn, Apr 17 '16 at 19:42
A key difference between your use of `as_strided` and mine is that I expand only one dimension, the one with overlapping indices. Expanding on two or more results in this ragged or diagonal boundary. — hpaulj, Apr 18 '16 at 15:48

hpaulj · Answer 2 · 2016-04-17T22:58:44.977

The overlapping blocks on the 2nd dimension require something like as_strided. B.M. has an earlier solution with as_strided, but I find it hard to understand. It also looks dangerous, since it displays out of bounds data. I'm approaching the task in smaller steps.

Selection along the last dimension for 'cols' [0,1] and [2,3] is relatively easy. Reshaping makes it easier.

In [27]: A=np.arange(36).reshape(3,3,4)
In [28]: A1=A.reshape(3,3,2,2)
In [29]: A2=A1[[0,1,2],:,[0,1,1],:]

In [30]: A2
Out[30]: 
array([[[ 0,  1],
        [ 4,  5],
        [ 8,  9]],

       [[14, 15],
        [18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31],
        [34, 35]]])

Focusing on one subarray, I found I could see it as 2 overlapping arrays:

In [59]: as_strided(A2[0],shape=(2,2,2),strides=(8,8,4))
Out[59]: 
array([[[0, 1],
        [4, 5]],

       [[4, 5],
        [8, 9]]])

np.lib.stride_tricks.as_strided is a tricky function to use. I've seen it mostly used for moving-windows applications.

Applied to the whole array:

In [65]: A3=as_strided(A2,shape=(3,2,2,2),strides=(24,8,8,4))

In [66]: A3
Out[66]: 
array([[[[ 0,  1],
         [ 4,  5]],
         ...
        [[30, 31],
         [34, 35]]]])

And the target:

In [71]: A3[[0,1,2],[0,1,0]]
Out[71]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

This can be put together in a way that allows assignment (shape and strides are adapted from the values for A1)

In [105]: A1 = A.reshape(3,3,2,2)
In [106]: A1s = as_strided(A1, shape=(3,2,2,2,2), strides=(48,16,16,8,4))

In [107]: A1s[[0,1,2],[0,1,0],:,[0,1,1],:]
Out[107]: 
array([[[ 0,  1],
        [ 4,  5]],

       [[18, 19],
        [22, 23]],

       [[26, 27],
        [30, 31]]])

Test of assignment:

In [108]: A1s[[0,1,2],[0,1,0],:,[0,1,1],:] = 99

In [109]: A
Out[109]: 
array([[[99, 99,  2,  3],
        [99, 99,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 99, 99],
        [20, 21, 99, 99]],

       [[24, 25, 99, 99],
        [28, 29, 99, 99],
        [32, 33, 34, 35]]])

If you don't require assignment, it's easier to understand the logic without striding:

np.array([A2[0,:-1], A2[1,1:], A2[2,:-1]])

slices=(slice(-1),slice(1,None))    
np.array([A2[i,slices[j]] for i,j in zip([0,1,2],[0,1,0])])

===========================

Earlier stumbling around for an answer:

I think it can be extracted with advanced indexing.

Ind = np.ix_([0,0,1,1,2,2], [0,1,1,2,0,1], [0,1,2,3,2,3])
a[ind]

The values may need tweaking, since I can't test it here.

The idea is to enumerate which rows , cols are needed in each dimension, and use ix_ (I think that's the right function) to add newaxis so they broadcast together.

Your idea of generalizing the 2d case is right. The trick is figure out when you need np.array([0,1,2]), and when you need to rotate it with ix_ or [:,None],etc. I end up testing various ideas.

It may need to be

  Ind = np.ix_([0,1,2], [0,1,1,2,0,1], [0,1,2,3,2,3])

No this isn't quite right. It will produce a 3x6x6; you want 3x2x2.

Reshaping to 3x3x2x2 may make indexing one the last dimension easier.

The striding trick mentioned in another answer may help turn the overlapping selection on 2nd into a similar 2x2 block. But I'd have to play around with this.

What I'm imaging is an index tuple of the form ([1,2,3],[?,?,?],:,[?,?,?],:)

score 0 · Answer 3 · answered Apr 17 '16 at 06:41

0

You can use indexing to get the expected out put separately:

>>> a = np.array([[[ 0, 1, 2, 3],
...   [ 4, 5, 6, 7],
...   [ 8, 9,10,11]],
... 
...  [[12,13,14,15],
...   [16,17,18,19],
...   [20,21,22,23]],
... 
...  [[24,25,26,27],
...   [28,29,30,31],
...   [32,33,34,35]]])

>>> a[0,:2,:2]
array([[0, 1],
       [4, 5]])
>>> a[1,1:,2:]
array([[18, 19],
       [22, 23]])
>>> a[2,:2,2:]
array([[26, 27],
       [30, 31]])
>>>

answered Apr 17 '16 at 06:41

Mazdak

105,000
18
159
188

This is correct but it's just a for loop. What I'm looking for is a single expression a[something] so that when I want to assign values to these elements I can just use a[something] = b. – Jeff Dong Apr 17 '16 at 07:01

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

When ask this question I didn't expect it would use tricky methods like ix_ and as_strided , mentioned by hpaulj and B. M., which was quite hard to understand by beginners like me..

I come up with a more understandable but not so efficient way inspired by B. M.'s

a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]]

combining with advanced indexing.

What we need here is just translating a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]] to an advanced indexing of a. In our case is:

a[[0 0 0 0 1 1 1 1 2 2 2 2],
  [0 0 1 1 1 1 2 2 0 0 1 1],
  [0 1 0 1 2 3 2 3 2 3 2 3]]

A simple translate function can be like this:

(cartesian function in the code is from pv.'s answer for this question)

# a[[0,1,2],[0:2,1:3,0:2],[0:2,2:4,2:4]]
def translate_idx(idx1=[0, 1, 2], idx2=[(0,2),(1,3),(0,2)], idx3=[(0,2),(2,4),(2,4)]):
    # first we need to get the combinations for correponding intervals
    # e.g for (1,3) and (2,4) we get [[1, 2], [1, 3], [2, 2], [2, 3]]
    idx23 = []
    for i in range(len(idx2)):
        # the function cartesian here just generates all conbinations from some arrays
        idx23.append(cartesian((np.arange(*idx2[i]), np.arange(*idx3[i]))))
    idx23 = np.concatenate(idx23).T
    # now we get index for 2nd and 3rd axis
    # [[0 0 1 1 1 1 2 2 0 0 1 1]
    #  [0 1 0 1 2 3 2 3 2 3 2 3]]
    # we can repeat 4 times for idx1 and append idx23
    step = (idx2[0][1] - idx2[0][0]) * (idx3[0][1] - idx3[0][0])
    idx123 = np.append(np.array(idx1).repeat(step).reshape((1, -1)), idx23, axis=0)
    return idx123

Then I can use

idx = translate_idx()
a[idx[0], idx[1], idx[2]]

to get what I want.

Though this method is not quite efficient, I think it reveals some correlation between this kind of problems and advanced indexing. In practice as_strided is definitely a better choice.

Thank you all guys for the detailed answers :)

Numpy extract arbitrary subarray along one axis

4 Answers4