4

I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.

I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).

I want to select from the data array given the indices so that the resulting array is n_samples x 2.

The code is below:

import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?

For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.

Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.

Ash
  • 3,428
  • 1
  • 34
  • 44

3 Answers3

5

Is this what you are looking for?

In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]: 
array([[7, 4],
       [7, 3],
       [4, 5],
       [8, 2],
       [5, 8]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
4

To get component #0, use

data[:, 0]

i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.

This can be easily generalized to

data[:, indices]

to select all relevant components.


But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:

>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
       [4, 3, 5, 2, 8]])

(You may need to transpose the result.)

kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • data[:, indices] results in shape (5, 5, 2) while I need a (5, 2) shape. – Ash Mar 15 '17 at 06:16
  • @Afshin `data[:, 0]` is `(5, 2)`, for a single component. What `(5, 2)` array you want when you have got five components? – kennytm Mar 15 '17 at 06:18
  • for each sample (first dimension) we have 7 2d components. We know the indices of one of those 7 components for each sample. We want for each sample to select the corresponding 2d component. data[:, 0] is the correct one, but it is just returning the 0 component for all samples while it should return the 7th component (index 6) for the third sample. – Ash Mar 15 '17 at 06:21
  • 1
    `data[:,indices][np.arange(n_samples),np.arange(n_samples)]` works but is obtuse. – jyalim Mar 15 '17 at 06:22
  • 2
    @Afshin What about `np.diagonal(data[:, indices]).T` – kennytm Mar 15 '17 at 06:28
  • This, I think is the one I need though I thought there is a simple indexing for it. If you change the answer I'll be able to select it as answer. – Ash Mar 15 '17 at 06:33
  • both the above answers expand the matrix a lot and select the diagonal section, I should test which one is faster and also if there is a better solution that doesn't expand the matrix because in my case indices are the number of batch_size and can be big. – Ash Mar 15 '17 at 06:36
  • 1
    @Afshin Updated. I guess hpaulj's one is faster since it doesn't need to build the intermediate #Samples×#Indices×2 array. – kennytm Mar 15 '17 at 06:38
2

You have a variety of ways to do so, but this is my loop recommendation:

selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])

The resulting array, selection, has the desired shape.

jyalim
  • 3,289
  • 1
  • 15
  • 22