8

In pandas, there are several methods to manipulate data in a given window (e.g. pd.rolling_mean or pd.rolling_std.) However, I would like to set a window overlap, which I think, is a pretty standard requirement. For example, in the following image, you can see a window spanning 256 samples and overlapping 128 samples.

http://health.tau.ac.il/Communication%20Disorders/noam/speech/mistorin/images/hamming_overlap1.JPG

How can I do that using the optimized methods included in Pandas or Numpy?

r_31415
  • 8,752
  • 17
  • 74
  • 121

3 Answers3

9

Using as_strided you would do something like this:

import numpy as np
from numpy.lib.stride_tricks import as_strided

def windowed_view(arr, window, overlap):
    arr = np.asarray(arr)
    window_step = window - overlap
    new_shape = arr.shape[:-1] + ((arr.shape[-1] - overlap) // window_step,
                                  window)
    new_strides = (arr.strides[:-1] + (window_step * arr.strides[-1],) +
                   arr.strides[-1:])
    return as_strided(arr, shape=new_shape, strides=new_strides)

If you pass a 1D array to the above function, it will return a 2D view into that array, with shape (number_of_windows, window_size), so you could calculate, e.g. the windowed mean as:

win_avg = np.mean(windowed_view(arr, win_size, win_overlap), axis=-1)

For example:

>>> a = np.arange(16)
>>> windowed_view(a, 4, 2)
array([[ 0,  1,  2,  3],
       [ 2,  3,  4,  5],
       [ 4,  5,  6,  7],
       [ 6,  7,  8,  9],
       [ 8,  9, 10, 11],
       [10, 11, 12, 13],
       [12, 13, 14, 15]])
>>> windowed_view(a, 4, 1)
array([[ 0,  1,  2,  3],
       [ 3,  4,  5,  6],
       [ 6,  7,  8,  9],
       [ 9, 10, 11, 12],
       [12, 13, 14, 15]])
Jaime
  • 65,696
  • 17
  • 124
  • 159
  • Thanks! Are you sure about `new_strides`? It gives me a type error: TypeError: can only concatenate tuple (not "int") to tuple. – r_31415 Aug 15 '13 at 15:58
  • I just edited it, there was a missing `,` right after `arr.strides[-1]`. – Jaime Aug 15 '13 at 18:39
  • I'm not sure why but I still get an error. This time `ValueError: negative dimensions are not allowed`. Sometimes I also get an empty array. I think it works for arrays of shape (1,) – r_31415 Aug 16 '13 at 01:16
  • There was a glitch in the shape calculation that is now corrected, but it mostly worked fine, specially if your overlap was half the window size. I´ve added a couple of examples. – Jaime Aug 16 '13 at 04:08
  • Now it works great. I really need to get the hang of those `as_strided` things. – r_31415 Aug 17 '13 at 22:10
2

I am not familiar with pandas, but in numpy you would do it something like this (untested):

def overlapped_windows(x, nwin, noverlap = None):
    if noverlap is None:
        noverlap = nwin // 2
    step = nwin - noverlap
    for i in range(0, len(x) - nwin + 1, step):
        window = x[i:i+nwin] #this is a view, not a copy
        y = window * hann(nwin)
        #your code here with y

This is ripped from some old code to calculate an averaged PSD, which you typically process with half-overlapping windows. Note that window is a 'view' into array x, which means it does not do any copying of data (very fast, so probably good) and that if you modify window you also modify x (so dont do window = hann * window).

Bas Swinckels
  • 18,095
  • 3
  • 45
  • 62
  • Thanks a lot. By the way, what is `hann(nwin)`? – r_31415 Aug 15 '13 at 16:02
  • 1
    Sorry, that should probably be (the wrongly named) `numpy.hanning()`, or `scipy.signal.hann()`, which is the [hann-window](http://en.wikipedia.org/wiki/Hann_window). This is a function that goes smoothly from 0 to 1 and back, so that with half-overlapping windows, you use all your points more or less equally. The smoothness is important when calculating FFTs. You suggested something like that with the blue line in your graph, not sure if it is really needed in your case. – Bas Swinckels Aug 15 '13 at 16:25
1

As of numpy 1.20 (released a few months ago), there is a new, much more stable implementation of this:

https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.sliding_window_view.html#numpy.lib.stride_tricks.sliding_window_view

To do a moving window with window size 3 and stride of 2, just do this (from the documentation):

x = np.arange(7)
sliding_window_view(x, 3)[::2, :]

I was looking at the responses here, and trying to use as_strided. It seemed to work fine with a float array I had. But then I tried to use it on a boolean array, and I got garbage out. Even after converting to ints or floats, same thing (different garbage). But using sliding_window_view works. Yes, you first have to generate the whole array and then subset it, which is a memory hog, but it works for what I need.

MaMaG
  • 359
  • 1
  • 9