How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?

Question

This is motivated by my answer here.

Given array A with shape (n0,n1), and array J with shape (n0), I'd like to create an array B with shape (n0) such that

B[i] = A[i,J[i]]

I'd also like to be able to generalize this to k-dimensional arrays, where A has shape (n0,n1,...,nk) and J has shape (n0,n1,...,n(k-1))

There are messy, flattening ways of doing this that make assumptions about index order:

import numpy as np
B = A.ravel()[   J+A.shape[-1]*np.arange(0,np.prod(J.shape)).reshape(J.shape) ]

The question is, is there a way to do this that doesn't rely on flattening arrays and dealing with indexes manually?

For very large arrays your flattening approach is fastest. – hpaulj Mar 09 '15 at 04:08 — hpaulj, Mar 09 '15 at 04:08

score 2 · Accepted Answer · edited May 23 '17 at 11:50

For the 2 and 1d case, this indexing works:

A[np.arange(J.shape[0]), J]

Which can be applied to more dimensions by reshaping to 2d (and back):

A.reshape(-1, A.shape[-1])[np.arange(np.prod(A.shape[:-1])).reshape(J.shape), J]

For 3d A this works:

A[np.arange(J.shape[0])[:,None], np.arange(J.shape[1])[None,:], J]

where the 1st 2 arange indices broadcast to the same dimension as J.

With functions in lib.index_tricks, this can be expressed as:

A[np.ogrid[0:J.shape[0],0:J.shape[1]]+[J]]
A[np.ogrid[slice(J.shape[0]),slice(J.shape[1])]+[J]]

or for multiple dimensions:

A[np.ix_(*[np.arange(x) for x in J.shape])+(J,)]
A[np.ogrid[[slice(k) for k in J.shape]]+[J]]

For small A and J (eg 2*3*4), J.choose(np.rollaxis(A,-1)) is faster. All of the extra time is in preparing the index tuple. np.ix_ is faster than np.ogrid.

np.choose has a size limit. At its upper end it is slower than ix_:

In [610]: Abig=np.arange(31*31).reshape(31,31)
In [611]: Jbig=np.arange(31)
In [612]: Jbig.choose(np.rollaxis(Abig,-1))
Out[612]: 
array([  0,  32,  64,  96, 128, 160, ... 960])

In [613]: timeit Jbig.choose(np.rollaxis(Abig,-1))
10000 loops, best of 3: 73.1 µs per loop
In [614]: timeit Abig[np.ix_(*[np.arange(x) for x in Jbig.shape])+(Jbig,)]
10000 loops, best of 3: 22.7 µs per loop
In [635]: timeit Abig.ravel()[Jbig+Abig.shape[-1]*np.arange(0,np.prod(Jbig.shape)).reshape(Jbig.shape) ]
10000 loops, best of 3: 44.8 µs per loop

I did similar indexing tests at https://stackoverflow.com/a/28007256/901925, and found that flat indexing was faster for much larger arrays (e.g. n0=1000). That's where I learned about the 32 limit for choice.

Phillip · Answer 2 · 2015-03-08T21:48:27.760

1

It doesn't solve your problem exactly, but choose() should nevertheless help:

>>> A = array(range(1, 28)).reshape(3, 3, 3)
>>> B = array([0, 0, 0, 1, 1, 1, 2, 2, 2]).reshape(3, 3)
>>> B.choose(A)
array([[ 1,  2,  3],
       [13, 14, 15],
       [25, 26, 27]])

It selects among the first dimension instead of the last.

edited Mar 08 '15 at 21:48

answered Mar 08 '15 at 21:32

Phillip

13,448
29
41

Thanks: for some reason, I couldn't get np.choose working when I was messing with it, and I was turned off by the documentation that specifically said not to use choose for choosing from arrays rather than lists of arrays. But in this case, `J.choose(np.rollaxis(A,-1))` works. – cge Mar 08 '15 at 21:51

How can I select values along an axis of an nD array with an (n-1)D array of indices of that axis?

2 Answers2

Linked