Index numpy array with multiple ranges

Question

Imagine an array a which has to be indexed by multiple ranges in idx:

In [1]: a = np.array([7,9,1,2,3,5,6,8,1,0,])
        idx = np.array([[0,3],[5,7],[8,9]])

        a, idx

Out[1]: (array([7, 9, 1, 2, 3, 5, 6, 8, 1, 0]),
         array([[0, 3],
                [5, 7],
                [8, 9]]))

Of course I could write a simple for loop, which results in the desired output:

In [2]: np.hstack([a[i[0]:i[1]] for i in idx])

Out[2]: array([7, 9, 1, 5, 6, 1])

But I would like a fully vectorized approach. I was hoping np.r_ for example would provide a solution. But the code below does not result in the desired output:

In [3]: a[np.r_[idx]]

Out[3]: array([[7, 2],
               [5, 8],
               [1, 0]])

Whereas writing out idx does result in the desired output. But the real life idx is too large to write out:

In [4]: a[np.r_[0:3,5:7,8:9]]

Out[4]: array([7, 9, 1, 5, 6, 1])

Perhaps this is helpful: https://stackoverflow.com/questions/43413582/selecting-multiple-slices-from-a-numpy-array-at-once — Dennis, Jun 13 '21 at 08:21
Thanks for helping! The problem is, in real life ```idx``` can become really large so writing it out would not be an option — pr94, Jun 13 '21 at 08:25
Don't use functions like `np.r_` on a **hope**. Read the docs. I tried to cover all your options in the linked SO. Either you concatenate the values after slicing, or concatenate the slices before indexing. — hpaulj, Jun 13 '21 at 16:11
@hpaulj My hope on ```np.r_``` was based on that writing out ```idx``` like ```a[np.r_[0:3,5:7,8:9]]``` does actually give the desired output, but that I just didn't know how to use the ```idx``` 2D array directly on ```np.r_```. Anyways, thanks for helping! However, I am not really sure where to find the 'linked SO' you describe? — pr94, Jun 14 '21 at 07:43

score 0 · Answer 1 · answered Jun 13 '21 at 08:42

0

You can try vectorizing slice itself:

>>> slice_np = np.vectorize(slice)
>>> slice_idx = tuple(slice_np(idx[:, 0], idx[:, 1]))
>>> a[np.r_[slice_idx]]
 array([7, 9, 1, 5, 6, 1])

answered Jun 13 '21 at 08:42

Sayandip Dutta

15,602
4
23
52

`np.hstack([np.arange(i,j) for i,j in idx])` is a faster way of generating the indices. – hpaulj Jun 13 '21 at 16:16
@hpaulj does it scale? For example `idx = np.vstack([idx]*1000)` – Sayandip Dutta Jun 13 '21 at 18:03
1

You could do some of your own time testing, but there are a lot of variations to explore. For the larger `idx` `slice_np` does scale somewhat better. But if `idx` is a list, instead of an array, the list comprehension does better, even with large `idx`. The other trade off is between OP's concatenate after index, versus your concatenate indices. As I noted in the linked SO, those alternatives tend to time about the same. There isn't a true "vectorized" option, one that moves all the iteration to compiled code. – hpaulj Jun 13 '21 at 19:08
I just read your answer. Your point is taken. – Sayandip Dutta Jun 13 '21 at 19:13

Index numpy array with multiple ranges

1 Answers1