7

I have a data matrix a and I have list of indices stored in array idx. I would like to get 10-length data starting at each of the indices defined by idx . Right now I use a for loop to achieve this. But it is extremely slow as I have to do this data fetch about 1000 times in an iteration. Below is a minimum working example.

import numpy as np
a = np.random.random(1000)
idx = np.array([1, 5, 89, 54])

# I want "data" array to have np.array([a[1:11], a[5:15], a[89:99], a[54:64]])
# I use for loop below but it is slow
data = []

for id in idx:
    data.append(a[id:id+10])  
data = np.array(data)

Is there anyway to speed up this process? Thanks.

EDIT: My question is different from the question asked here. In the question, the size of the chunks is random in contrast to fixed chunk size in my question. Other differences exist. I do not have to use up the entire array a and an element can occur in more than one chunk. My question does not necessarily "split" the array.

learner
  • 3,168
  • 3
  • 18
  • 35
  • I am assuming `a` is *ones* for the purpose of the question, is that right? – Ivan Dec 12 '20 at 08:44
  • @Ivan haha yes. I have edited it to now have random values. – learner Dec 12 '20 at 08:44
  • 3
    this is trickier, because there are overlapping sections! If you do `np.split(a, idx)` you will split the array on indices `1`, `5` leaving you with `[array of size 1, array of size 4, ...` which is not the desired result. – Ivan Dec 12 '20 at 08:58

1 Answers1

7

(Thanks to suggestion from @MadPhysicist)

This should work:

a[idx.reshape(-1, 1) + np.arange(10)]

Output: Shape (L,10), where L is the length of idx

Notes:

  1. This does not check for index-out-of-bound situations. I suppose it's easy to first ensure that idx doesn't contain such values.

  2. Using np.take(a, idx.reshape(-1, 1) + np.arange(10), mode='wrap') is an alternative, that will handle out-of-bounds indices by wrapping them around a. Passing mode='clip' instead of mode='wrap' would clip the excessive indices to the last index of a. But then, np.take() would probably have a completely different perf. characteristic / scaling characteristic.

fountainhead
  • 3,584
  • 1
  • 8
  • 17
  • I'm thinking that it might be faster to sort the index, especially for short arrays. +1 either way – Mad Physicist Dec 12 '20 at 09:34
  • 5
    Also, you really don't need to reshape and transpose. The output array is the shape of the index. `idx.reshape(-1, 1) + np.arange(10)` is sufficient – Mad Physicist Dec 12 '20 at 09:36
  • @MadPhysicist -- Thanks, edited with the simplification. – fountainhead Dec 12 '20 at 09:44
  • 2
    We can see edits in the edit history. No need to mark "edit" and "update" in questions and answers. It's a common misconception among beginning authors that people want to see anything besides your polished product. – Mad Physicist Dec 12 '20 at 09:47