15

I have a ndarray of shape(z,y,x) containing values. I am trying to index this array with another ndarray of shape(y,x) that contains the z-index of the value I am interested in.

import numpy as np
val_arr = np.arange(27).reshape(3,3,3)
z_indices = np.array([[1,0,2],
                      [0,0,1],
                      [2,0,1]])

Since my arrays are rather large I tried to use np.take to avoid unnecessary copies of the array but just can't wrap my head around indexing 3-dimensional arrays with it.

How do I have to index val_arr with z_indices to get the values at the desired z-axis position? The expected outcome would be:

result_arr = np.array([[9,1,20],
                       [3,4,14],
                       [24,7,17]])
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
Kersten
  • 874
  • 1
  • 13
  • 25

4 Answers4

14

You can use choose to make the selection:

>>> z_indices.choose(val_arr)
array([[ 9,  1, 20],
       [ 3,  4, 14],
       [24,  7, 17]])

The function choose is incredibly useful, but can be somewhat tricky to make sense of. Essentially, given an array (val_arr) we can make a series of choices (z_indices) from each n-dimensional slice along the first axis.

Also: any fancy indexing operation will create a new array rather than a view of the original data. It is not possible to index val_arr with z_indices without creating a brand new array.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
5

With readability, np.choose definitely looks great.

If performance is of essence, you can calculate the linear indices and then use np.take or use a flattened version with .ravel() and extract those specific elements from val_arr. The implementation would look something like this -

def linidx_take(val_arr,z_indices):

    # Get number of columns and rows in values array
     _,nC,nR = val_arr.shape

     # Get linear indices and thus extract elements with np.take
    idx = nC*nR*z_indices + nR*np.arange(nR)[:,None] + np.arange(nC)
    return np.take(val_arr,idx) # Or val_arr.ravel()[idx]

Runtime tests and verify results -

Ogrid based solution from here is made into a generic version for these tests, like so :

In [182]: def ogrid_based(val_arr,z_indices):
     ...:   v_shp = val_arr.shape
     ...:   y,x = np.ogrid[0:v_shp[1], 0:v_shp[2]]
     ...:   return val_arr[z_indices, y, x]
     ...: 

Case #1: Smaller datasize

In [183]: val_arr = np.random.rand(30,30,30)
     ...: z_indices = np.random.randint(0,30,(30,30))
     ...: 

In [184]: np.allclose(z_indices.choose(val_arr),ogrid_based(val_arr,z_indices))
Out[184]: True

In [185]: np.allclose(z_indices.choose(val_arr),linidx_take(val_arr,z_indices))
Out[185]: True

In [187]: %timeit z_indices.choose(val_arr)
1000 loops, best of 3: 230 µs per loop

In [188]: %timeit ogrid_based(val_arr,z_indices)
10000 loops, best of 3: 54.1 µs per loop

In [189]: %timeit linidx_take(val_arr,z_indices)
10000 loops, best of 3: 30.3 µs per loop

Case #2: Bigger datasize

In [191]: val_arr = np.random.rand(300,300,300)
     ...: z_indices = np.random.randint(0,300,(300,300))
     ...: 

In [192]: z_indices.choose(val_arr) # Seems like there is some limitation here with bigger arrays.
Traceback (most recent call last):

  File "<ipython-input-192-10c3bb600361>", line 1, in <module>
    z_indices.choose(val_arr)

ValueError: Need between 2 and (32) array objects (inclusive).


In [194]: np.allclose(linidx_take(val_arr,z_indices),ogrid_based(val_arr,z_indices))
Out[194]: True

In [195]: %timeit ogrid_based(val_arr,z_indices)
100 loops, best of 3: 3.67 ms per loop

In [196]: %timeit linidx_take(val_arr,z_indices)
100 loops, best of 3: 2.04 ms per loop
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • I just encountered the ValueError with some of my datasets. You are a savior for providing a solution before I even encountered the problem. Unfortunately I can't accept two answers but I'd like to thank you a lot for pointing out the issue with np.choose(). – Kersten Aug 19 '15 at 11:54
  • @Kersten No problem really! It's been interesting to find out about that in-built function and also to learn about it's limitations. SO is a great learning place! – Divakar Aug 19 '15 at 11:57
  • @Divakar this doesn't work when y (rows) and x (cols) are different lengths, for example: `linidx_take(np.random.rand(50,250,150),np.random.randint(0,50,(250,150)))` results in `ValueError: operands could not be broadcast together with shapes (250,150) (150,1)` – user2856 May 04 '16 at 23:42
  • @Divakar this is a great answer! One question . . . are nR and nC swapped in your code? Doesn't numpy.shape return in the order Rows then Columns? – khafen Feb 06 '18 at 21:58
  • FWIW, the `linidx_take(val_arr,z_indices)` function from @divakar, above, only works correctly for me (in connection with [this percentiles code](https://krstn.eu/np.nanpercentile()-there-has-to-be-a-faster-way/)) if the 6th line is changed to `idx = nC*nR*z_indices + nR*np.arange(nC)[:,None] + np.arange(nC)`. Just FYI. – ksed Jul 16 '18 at 15:20
5

If you have numpy >= 1.15.0 you could use numpy.take_along_axis. In your case:

result_array = numpy.take_along_axis(val_arr, z_indices.reshape((3,3,1)), axis=2)

That should give you the result you want in one neat line of code. Note the size of the indices array. It needs to have the same number of dimensions as your val_arr (and the same size in the first two dimensions).

Roberto
  • 267
  • 4
  • 10
4

Inspired by this thread, using np.ogrid:

y,x = np.ogrid[0:3, 0:3]
print [z_indices, y, x]
[array([[1, 0, 2],
        [0, 0, 1],
        [2, 0, 1]]),
 array([[0],
        [1],
        [2]]),
 array([[0, 1, 2]])]

print val_arr[z_indices, y, x]
[[ 9  1 20]
 [ 3  4 14]
 [24  7 17]]

I have to admit that multidimensional fancy indexing can be messy and confusing :)

Community
  • 1
  • 1
Daniel Lenz
  • 3,334
  • 17
  • 36
  • I won't delete this for now to illustrate that there is always a complicated way and a simple, elegant one in python @ajcr – Daniel Lenz Aug 19 '15 at 08:50