Sampling a fixed length sequence from a numpy array

Question

I have a data matrix a and I have list of indices stored in array idx. I would like to get 10-length data starting at each of the indices defined by idx . Right now I use a for loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.

import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])

# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []

for id in idx:
    data.append(a[id:id+10])  
data = np.array(data)

Is there anyway to speed up this process? Thanks.

EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a and an element can occur in more than one chunk. My question does not necessarily "split" the array.

I am assuming `a` is *ones* for the purpose of the question, is that right? — Ivan, Dec 12 '20 at 08:44
this is trickier, because there are overlapping sections! If you do `np.split(a, idx)` you will split the array on indices `1`, `5` leaving you with `[array of size 1, array of size 4, ...` which is not the desired result. — Ivan, Dec 12 '20 at 08:58

fountainhead · Accepted Answer · 2021-01-31T08:16:07.030

7

(Thanks to suggestion from @MadPhysicist)

This should work:

a[idx.reshape(-1, 1) + np.arange(10)]

Output: Shape (L,10), where L is the length of idx

Notes:

This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx doesn't contain such values.
Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap') is an alternative, that will handle out-of-bounds indices by wrapping them around a. Passing mode='clip' instead of mode='wrap' would clip the excessive indices to the last index of a. But then, np.take() would probably have a completely different perf. characteristic / scaling characteristic.

edited Jan 31 '21 at 08:16

answered Dec 12 '20 at 09:30

fountainhead

3,584
1
8
17

I'm thinking that it might be faster to sort the index, especially for short arrays. +1 either way – Mad Physicist Dec 12 '20 at 09:34
5

Also, you really don't need to reshape and transpose. The output array is the shape of the index. `idx.reshape(-1, 1) + np.arange(10)` is sufficient – Mad Physicist Dec 12 '20 at 09:36
@MadPhysicist -- Thanks, edited with the simplification. – fountainhead Dec 12 '20 at 09:44
2

We can see edits in the edit history. No need to mark "edit" and "update" in questions and answers. It's a common misconception among beginning authors that people want to see anything besides your polished product. – Mad Physicist Dec 12 '20 at 09:47

Sampling a fixed length sequence from a numpy array

1 Answers1