How can I divide a numpy array into n sub-arrays using a sliding window of size m?

Question

I have a big NumPy array that I want to divide into many subarrays by moving a window of a particular size, here's my code in the case of subarrays of size 11:

import numpy as np

x = np.arange(10000)
T = np.array([])

for i in range(len(x)-11):
    s = x[i:i+11]
    T = np.concatenate((T, s), axis=0)

But it is very slow for arrays having more than 1 million entries, is there any tip to make it faster?

I don't know what your overall objective is. But you should probably start with `numpy.asarray` and from there if you can `numpy.split` if you want sub-arrays or `numpy.reshape` instead of whatever concatenation you're doing. — waffles, Dec 11 '19 at 02:34

score 3 · Answer 1 · answered Dec 11 '19 at 02:54

Actually, this is a case for as_strided:

from numpy.lib.stride_tricks import as_strided

# set up
x = np.arange(1000000); windows = 11

# strides of x
stride = x.strides;

T = as_strided(x, shape=(len(x)-windows+1, windows), strides=(stride, stride))

Output:

array([[     0,      1,      2, ...,      8,      9,     10],
       [     1,      2,      3, ...,      9,     10,     11],
       [     2,      3,      4, ...,     10,     11,     12],
       ...,
       [999987, 999988, 999989, ..., 999995, 999996, 999997],
       [999988, 999989, 999990, ..., 999996, 999997, 999998],
       [999989, 999990, 999991, ..., 999997, 999998, 999999]])

Performance:

5.88 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

I think your current method does not produce what you are describing. Here is a faster method which splits a long array into many sub arrays using list comprehension:

Code Fix:

import numpy as np 

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])

Speed Comparison:

sample_1 = '''
import numpy as np 

x = np.arange(10000)
T = np.array([])

for i in range(len(x)-11):
    s = x[i:i+11]
    T = np.concatenate((T, s),axis=0)

'''    

sample_2 = '''
import numpy as np 

x = np.arange(10000)
T = np.array([])

T = np.array([np.array(x[i:i+11]) for i in range(len(x)-11)])
'''

# Testing the times
import timeit
print(timeit.timeit(sample_1, number=1))
print(timeit.timeit(sample_2, number=1))

Speed Comparison Output:

5.839815437000652   # Your method
0.11047088200211874 # List Comprehension

I only checked 1 iteration as the difference is quite significant and many iterations would not change the overall outcome.

Output Comparison:

# Your method:
[  0.00000000e+00   1.00000000e+00   2.00000000e+00 ...,   9.99600000e+03
   9.99700000e+03   9.99800000e+03]

# Using List Comprehension:
[[   0    1    2 ...,    8    9   10]
 [   1    2    3 ...,    9   10   11]
 [   2    3    4 ...,   10   11   12]
 ..., 
 [9986 9987 9988 ..., 9994 9995 9996]
 [9987 9988 9989 ..., 9995 9996 9997]
 [9988 9989 9990 ..., 9996 9997 9998]]

You can see that my method actually produces sub-arrays, unlike what your provided code does.

Note:

These tests were carried out on x which was just a list of ordered numbers from 0 to 10000.

Also see that `range()` automatically starts at 0 so there is no need to specify that. Furthermore, your code produced a 1D array due to the concatenation rather than a 2D array (array of arrays). — lbragile, Dec 11 '19 at 02:41
Thank you it worked well and faster, I was reshaping my output to get the same result as you. — Fourat Thamri, Dec 11 '19 at 02:50