0

I have an array with shape (100000,) over which I want to apply a sliding window of length 200 with a step size of 1. This means that the output array will have the shape (99800,200) - i.e., all unique chunks of length 200. I cannot find an efficient function in numpy that achieves this. I have tried:

for i in range(data.shape[0] - 200):
    windows = np.append(windows , data[i:i+200]);

Which not only produces the wrong shape (1D), but it is also incredibly slow. Is there a fast function in Numpy to do this?

Thomas Wagenaar
  • 6,489
  • 5
  • 30
  • 73
  • 1
    Does this answer your question? [Using numpy \`as\_strided\` function to create patches, tiles, rolling or sliding windows of arbitrary dimension](https://stackoverflow.com/questions/45960192/using-numpy-as-strided-function-to-create-patches-tiles-rolling-or-sliding-w) – Daniel F Nov 10 '20 at 08:54
  • if you're not `numpy`-only, then `skimage.util.view_as_windows()` user-friendly as well. – Daniel F Nov 10 '20 at 08:55
  • Importantly, `as_strided` or `view` -based window functions don't take up any more disk space than the original array, while any method that copies the data (as in your code and @AlexP's answer) can easily cause `MemoryError`s. The trade off is that you shouldn't read *back* into the windows in any vectorized way as this can cause a race condition. – Daniel F Nov 10 '20 at 09:07

3 Answers3

0

Try stride_tricks in numpy. It basically does not use up any extra space than the original array a, but creates a (virtual) strided array containing all the sliding windows.

def slide(a, size):
    stride = a.strides[0]
    n = a.size - size + 1
    return np.lib.stride_tricks.as_strided(a, shape = (n, size), strides = (stride, stride))

a = np.arange(100000)
slide(a, size = 200)
>>>array([[    0,     1,     2, ...,   197,   198,   199],
       [    1,     2,     3, ...,   198,   199,   200],
       [    2,     3,     4, ...,   199,   200,   201],
       ...,
       [99798, 99799, 99800, ..., 99995, 99996, 99997],
       [99799, 99800, 99801, ..., 99996, 99997, 99998],
       [99800, 99801, 99802, ..., 99997, 99998, 99999]])
swag2198
  • 2,546
  • 1
  • 7
  • 18
0

Here's a numpy answer

window_size = 10
i = np.arange(data.size - window_size + 1)
indices = np.add(np.array([np.arange(window_size)] * (data.size - window_size + 1)), i.reshape(-1, 1))

windows = data[indices]
Alex P
  • 1,105
  • 6
  • 18
0

Best function I've seen for this (non-numpy) is skimage.util.view_as_windows()

from skimage.util import view_as_windows

windows = view_as_windows(data, 200)

If you want numpy-only, the recipe in the dupe target is the most general answer, although @swag2198 suggests a more lightweight version in another answer here.

Daniel F
  • 13,620
  • 2
  • 29
  • 55