Given a ndarray ar
of shape (n, m)
I want to "extract" subsequences along axis 1 of length k
with k<m
. In case of a known starting index start
for the subsequences of length k
this can be solved with new_ar = ar[:, start:end]
(or just start:start+k
).
However, what if I have a list start_list
and an end_list
of length n
(or just the start_list
, since the length of the subsequence is known anyway), which contains the starting indices (and ending indices) of the subsequences I want to extract? Intuitively I tried ar[:, start_list:end_list]
, but this throws TypeError: slice indices must be integers or None or have an __index__ method
.
What would be a solution to this problem without the usage of loops and leveraging NumPys methods? For my problem the for-loop took 30 mins, but this has to have a NumPy-style 5ms solution since it's just indexing.
[edit]: Since the problem is probably better understood with code (thank you for the hints), I'll try to make it more compact what I want and show what I did to solve it with a loop.
I have an ndarray of shape (40450, 200000)
, representing 40450
signals of length 200000
each. The signals are shifted and I want to align them. So I want to extract subsequences of length say 190000
from each of the 40450
sequences. For this, I have a list start_list
of length 40450
, containing the starting indices for the subsequences (each of the 40450
subsequences I want to extract has a different starting point in the original sequence of length 200000
).
I can solve this with a for loop (ar
contains the original sequences, start_list
the starting indices):
k = 190000
ar_new = np.zeros((40450, k))
for i in range(ar_new.shape[0]):
ar_new[i] = ar[i, start_list[i]:start_list[i]+k]
If e. g. start_list[0]
is 0
, this means that I need ar[0, 0:190000]
, if start_list[10000]
is 1337
, this means I need ar[10000, 1337:1337+190000]
etc.
But this takes >30 mins for my case and I am sure it can somehow be solved with NumPy built-in methods/some slicing magic.