209

I need a rolling window (aka sliding window) iterable over a sequence/iterator/generator. (Default Python iteration could be considered a special case, where the window length is 1.) I'm currently using the following code. How can I do it more elegantly and/or efficiently?

def rolling_window(seq, window_size):
    it = iter(seq)
    win = [it.next() for cnt in xrange(window_size)] # First window
    yield win
    for e in it: # Subsequent windows
        win[:-1] = win[1:]
        win[-1] = e
        yield win

if __name__=="__main__":
    for w in rolling_window(xrange(6), 3):
        print w

"""Example output:   
   [0, 1, 2]
   [1, 2, 3]
   [2, 3, 4]
   [3, 4, 5]
"""

For the specific case of window_size == 2 (i.e., iterating over adjacent, overlapping pairs in a sequence), see also How can I iterate over overlapping (current, next) pairs of values from a list?.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
David B.
  • 5,700
  • 5
  • 31
  • 52
  • 5
    If you're looking to perform some kind of operation on each window as you iterate (e.g. `sum()` or `max()`) it is worth bearing in mind that there are efficient algorithms to compute the new value for each window in *constant* time (irrespective of window size). I have collected some of these algorithms together in a Python library: [rolling](https://github.com/ajcr/rolling). – Alex Riley Apr 21 '18 at 12:07

30 Answers30

175

There's one in an old version of the Python docs with itertools examples:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

The one from the docs is a little more succinct and uses itertools to greater effect I imagine.


If your iterator is a simple list/tuple a simple way to slide through it with a specified window size would be:

seq = [0, 1, 2, 3, 4, 5]
window_size = 3

for i in range(len(seq) - window_size + 1):
    print(seq[i: i + window_size])

Output:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
Red
  • 26,798
  • 7
  • 36
  • 58
Daniel DiPaolo
  • 55,313
  • 14
  • 116
  • 115
  • 2
    Nice answer, but (and I know you're just reproducing the recipe as linked), I wonder why the default window size should be 2? Should it have a default at all? – SingleNegationElimination Jul 25 '11 at 22:02
  • 19
    @TakenMacGuy: I dunno what the author of that recipe's reasoning is, but I'd also choose 2. 2 is the smallest useful window size (otherwise you're just iterating and don't need the window), and it is also common to need to know the previous (or next) item, arguably more so than any other specific n. – kindall Jul 26 '11 at 23:41
  • 33
    Does anyone know why this example was removed from the docs? Was there something wrong with it, or there is an easier alternative now? – wim Apr 15 '13 at 15:10
  • 14
    got curious about the example removal and found [rhettinger committed on Oct 26, 2003: Replace the window() example with pairwise() which demonstrates tee().](https://github.com/python/cpython/commit/83d953dc955ef9e948812fefb3f489728f219470#diff-7e59aecf5eb9159dee8fe29ba5d8683f) – second Jun 17 '16 at 16:03
  • @second: That said, the `pairwise` recipe could be extended with the `consume` recipe to reproduce the `window` recipe. [MrDrFenner's answer](http://stackoverflow.com/a/11249883/364696) is doing roughly that in the first `tee` using version, though it's slightly less efficient, since it leaves the iterators wrapped in `islice` objects instead of using `islice` to consume values, then using the original partially consumed iterators (which has the mild cost of making each iteration have to pass through `islice`'s wrapper). – ShadowRanger Dec 02 '16 at 16:34
  • 1
    @second: And just for demonstration, I added [an answer that's explicitly reusing the `consume` recipe](http://stackoverflow.com/a/40937024/364696). – ShadowRanger Dec 02 '16 at 16:44
  • @second Raymond seems to have replaced the more general recipe `window` with one of its special cases `pairwise`. Questionable. It's a shame the commit message didn't say anything about the motivation for the decision. – wim Aug 24 '17 at 05:22
  • 2
    When would one enter the `for elem in it` loop? – Kashif Jan 12 '18 at 18:12
  • 1
    @Glassjawed, this loop runs every time the `seq` is longer than `n` Since `yield` is not like `return` and will continue moving the window up to the end of `it`. – ykaner Jul 02 '18 at 09:27
58

This seems tailor-made for a collections.deque since you essentially have a FIFO (add to one end, remove from the other). However, even if you use a list you shouldn't be slicing twice; instead, you should probably just pop(0) from the list and append() the new item.

Here is an optimized deque-based implementation patterned after your original:

from collections import deque

def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(n)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win

In my tests it handily beats everything else posted here most of the time, though pillmuncher's tee version beats it for large iterables and small windows. On larger windows, the deque pulls ahead again in raw speed.

Access to individual items in the deque may be faster or slower than with lists or tuples. (Items near the beginning are faster, or items near the end if you use a negative index.) I put a sum(w) in the body of my loop; this plays to the deque's strength (iterating from one item to the next is fast, so this loop ran a a full 20% faster than the next fastest method, pillmuncher's). When I changed it to individually look up and add items in a window of ten, the tables turned and the tee method was 20% faster. I was able to recover some speed by using negative indexes for the last five terms in the addition, but tee was still a little faster. Overall I would estimate that either one is plenty fast for most uses and if you need a little more performance, profile and pick the one that works best.

kindall
  • 178,883
  • 35
  • 278
  • 309
  • I almost included the `collections.deque` idea in the original question as an example of a possible improvement. Using pop(0) and append(...) (or '+') is, of course, much better. The version in my question seems to indicate that I think lists are arrays... – David B. Jul 25 '11 at 21:56
  • Actually, on further reflection, it's *possible* that the method call overhead from `append()` and `pop()` would be slower than the slicing, especially for small values. This overhead can be reduced by assigning a local variable to point to these methods. – kindall Jul 25 '11 at 22:05
  • Wrote you a deque-based implementation and did a little profiling of the various solutions posted here. – kindall Jul 27 '11 at 21:37
  • Also, looks as though `append()` and `pop()` are marginally faster than the slicing, at least in some cases, for lists. – kindall Jul 27 '11 at 21:40
  • Premature optimization is the root of evil, but when I'm building functions for my personal library I don't mind prematurely optimizing when I'm having fun doing so! Getting `collections.deque`-like O(1) inserts with array-like O(1) lookups isn't too hard using a circular array--- store the window in a array, but keep a 'head' index that points to the front of the sequence. Reading out the stored sequence requires wrapping around from the back to the front of the array. Sequence indices are translated to array indices by addition of the 'head' index modulo the array length. – David B. Jul 27 '11 at 22:32
  • I'm surprised the `tee` version was faster (for small windows and large iterators) in your tests. It should require constructing a new tuple for each value returned and calling next() on multiple (albeit small for tiny windows) `tee`-d iterators. The yield version does one next() and one deque append (with internal delete from the front of the deque). – David B. Jul 27 '11 at 22:36
  • I thought this was fun, too, and I love learning more about how Python works internally. If I get a chance I might try the circular list idea -- I have a feeling that having to reorder it will be expensive, though. – kindall Jul 28 '11 at 04:33
  • 14
    `yield win` should be `yield tuple(win)` or `yield list(win)` to prevent returning an iterator of references to the same `deque` object. – Joel Cornett Mar 06 '13 at 19:40
  • That's not necessarily a bug. The deque always has the most recent `n` items in it, after all. If you want a permanent copy, you can do that outside of the iterator. – kindall Mar 06 '13 at 23:00
  • I like it. I would have upvoted, but I had the same concern as @JoelCornett. – Paul Draper Dec 16 '13 at 07:48
  • 1
    I submitted this [to PyPI](https://pypi.python.org/pypi/sliding_window). Install with `pip install sliding_window`, and run with `from sliding_window import window`. – Avery Richardson Feb 24 '14 at 07:36
  • 2
    You're in for a shock if you think `list(window(range(10)))` should produce something like [[0,1],[1,2],[2,3],...] – Paul Feb 18 '16 at 20:47
  • 1
    It obviously won't; you'd need to do something like `list(list(x) for x in window(range(10)))` or else add that to the iterator. For some applications this will matter, for others not, and since I was going for speed I elected *not* and put the onus on the caller to copy the window if needed. – kindall Feb 18 '16 at 21:56
  • Understood. The previous comment is meant as a simple, explicit cautionary note for casual users who are less familiar with the Python object system. – Paul Feb 19 '16 at 14:11
  • 1
    If you add back the needed `tuple()` before yield, this method does not have any advantage over the others. – kawing-chiu Mar 04 '17 at 06:30
41

I like tee():

from itertools import tee, izip

def window(iterable, size):
    iters = tee(iterable, size)
    for i in xrange(1, size):
        for each in iters[i:]:
            next(each, None)
    return izip(*iters)

for each in window(xrange(6), 3):
    print list(each)

gives:

[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
pillmuncher
  • 10,094
  • 2
  • 35
  • 33
  • From my quick `timeit` tests, this is much slower than Daniel DePaolo's (by about a 2:1 ratio) and doesn't feel much "nicer". – David B. Jul 25 '11 at 23:15
  • @David B.: On my box it's only about 8% slower than Daniel DePaolo's. – pillmuncher Jul 25 '11 at 23:42
  • @pillmuncher: Python 2.7 or 3.x? I was using 2.7. The ratio is also fairly sensitive to the value of `size`. If you increase it (e.g., if the iterable is 100000 elements long, make the window size 1000), you may see an increase. – David B. Jul 26 '11 at 00:29
  • 2
    @David B.: What you say makes sense. In my code the setup time for `iters` is O(size!), and calling `next()` many times (in `izip()`) is probably a lot more time consuming than copying a tuple twice. I was using Python 2.6.5, BTW. – pillmuncher Jul 26 '11 at 00:46
  • @pillmuncher: You mean, setup time for `iters`is O(size^2), right? – David B. Jul 27 '11 at 20:51
  • @David B.: Yeah, O(size^2) it is. – pillmuncher Jul 27 '11 at 21:18
  • Python docs say: "This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use list() instead of tee()." – Danilo Bargen Jan 17 '13 at 21:23
  • @DaniloBargen, but then using `list` would not work for infinite iterables. – Paul Draper Dec 16 '13 at 07:45
  • You can use `islice()` for an **O(size)** setup. – Apalala May 26 '17 at 23:11
  • @Apalala No, it would still be quadratic. – Kelly Bundy Dec 21 '22 at 17:25
29

There is a library which does exactly what you need:

import more_itertools
list(more_itertools.windowed([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3, step=3))

Out: [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]
Nikolay Frick
  • 2,145
  • 22
  • 17
  • 3
    `step=3` should actually be removed to match the OP's request: `list(more_itertools.windowed(range(6), 3))` – teichert Jul 06 '19 at 20:02
  • But it returned a list of tuples. – ABCD Mar 03 '22 at 17:18
  • 2
    Another way of doing what this answer does (getting non-overlapping chunks) is more_itertools.chunked. `list(more_itertools.chunked([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],n=3))` – Christian Long Oct 11 '22 at 15:24
20

Here's a generalization that adds support for step, fillvalue parameters:

from collections import deque
from itertools import islice

def sliding_window(iterable, size=2, step=1, fillvalue=None):
    if size < 0 or step < 1:
        raise ValueError
    it = iter(iterable)
    q = deque(islice(it, size), maxlen=size)
    if not q:
        return  # empty iterable or size == 0
    q.extend(fillvalue for _ in range(size - len(q)))  # pad to size
    while True:
        yield iter(q)  # iter() to avoid accidental outside modifications
        try:
            q.append(next(it))
        except StopIteration: # Python 3.5 pep 479 support
            return
        q.extend(next(it, fillvalue) for _ in range(step - 1))

It yields in chunks size items at a time rolling step positions per iteration padding each chunk with fillvalue if necessary. Example for size=4, step=3, fillvalue='*':

 [a b c d]e f g h i j k l m n o p q r s t u v w x y z
  a b c[d e f g]h i j k l m n o p q r s t u v w x y z
  a b c d e f[g h i j]k l m n o p q r s t u v w x y z
  a b c d e f g h i[j k l m]n o p q r s t u v w x y z
  a b c d e f g h i j k l[m n o p]q r s t u v w x y z
  a b c d e f g h i j k l m n o[p q r s]t u v w x y z
  a b c d e f g h i j k l m n o p q r[s t u v]w x y z
  a b c d e f g h i j k l m n o p q r s t u[v w x y]z
  a b c d e f g h i j k l m n o p q r s t u v w x[y z * *]

For an example of use case for the step parameter, see Processing a large .txt file in python efficiently.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
12

Just a quick contribution.

Since the current python docs don't have "window" in the itertool examples (i.e., at the bottom of http://docs.python.org/library/itertools.html), here's an snippet based on the code for grouper which is one of the examples given:

import itertools as it
def window(iterable, size):
    shiftedStarts = [it.islice(iterable, s, None) for s in xrange(size)]
    return it.izip(*shiftedStarts)

Basically, we create a series of sliced iterators, each with a starting point one spot further forward. Then, we zip these together. Note, this function returns a generator (it is not directly a generator itself).

Much like the appending-element and advancing-iterator versions above, the performance (i.e., which is best) varies with list size and window size. I like this one because it is a two-liner (it could be a one-liner, but I prefer naming concepts).

It turns out that the above code is wrong. It works if the parameter passed to iterable is a sequence but not if it is an iterator. If it is an iterator, the same iterator is shared (but not tee'd) among the islice calls and this breaks things badly.

Here is some fixed code:

import itertools as it
def window(iterable, size):
    itrs = it.tee(iterable, size)
    shiftedStarts = [it.islice(anItr, s, None) for s, anItr in enumerate(itrs)]
    return it.izip(*shiftedStarts)

Also, one more version for the books. Instead of copying an iterator and then advancing copies many times, this version makes pairwise copies of each iterator as we move the starting position forward. Thus, iterator t provides both the "complete" iterator with starting point at t and also the basis for creating iterator t + 1:

import itertools as it
def window4(iterable, size):
    complete_itr, incomplete_itr = it.tee(iterable, 2)
    iters = [complete_itr]
    for i in xrange(1, size):
        incomplete_itr.next()
        complete_itr, incomplete_itr = it.tee(incomplete_itr, 2)
        iters.append(complete_itr)
    return it.izip(*iters)
Danica
  • 28,423
  • 6
  • 90
  • 122
MrDrFenner
  • 1,090
  • 11
  • 19
10
def GetShiftingWindows(thelist, size):
    return [ thelist[x:x+size] for x in range( len(thelist) - size + 1 ) ]

>> a = [1, 2, 3, 4, 5]
>> GetShiftingWindows(a, 3)
[ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]
heyyou482
  • 101
  • 1
  • 5
9

Just to show how you can combine itertools recipes, I'm extending the pairwise recipe as directly as possible back into the window recipe using the consume recipe:

def consume(iterator, n):
    "Advance the iterator n-steps ahead. If n is none, consume entirely."
    # Use functions that consume iterators at C speed.
    if n is None:
        # feed the entire iterator into a zero-length deque
        collections.deque(iterator, maxlen=0)
    else:
        # advance to the empty slice starting at position n
        next(islice(iterator, n, n), None)

def window(iterable, n=2):
    "s -> (s0, ...,s(n-1)), (s1, ...,sn), (s2, ..., s(n+1)), ..."
    iters = tee(iterable, n)
    # Could use enumerate(islice(iters, 1, None), 1) to avoid consume(it, 0), but that's
    # slower for larger window sizes, while saving only small fixed "noop" cost
    for i, it in enumerate(iters):
        consume(it, i)
    return zip(*iters)

The window recipe is the same as for pairwise, it just replaces the single element "consume" on the second tee-ed iterator with progressively increasing consumes on n - 1 iterators. Using consume instead of wrapping each iterator in islice is marginally faster (for sufficiently large iterables) since you only pay the islice wrapping overhead during the consume phase, not during the process of extracting each window-ed value (so it's bounded by n, not the number of items in iterable).

Performance-wise, compared to some other solutions, this is pretty good (and better than any of the other solutions I tested as it scales). Tested on Python 3.5.0, Linux x86-64, using ipython %timeit magic.

kindall's the deque solution, tweaked for performance/correctness by using islice instead of a home-rolled generator expression and testing the resulting length so it doesn't yield results when the iterable is shorter than the window, as well as passing the maxlen of the deque positionally instead of by keyword (makes a surprising difference for smaller inputs):

>>> %timeit -r5 deque(windowkindall(range(10), 3), 0)
100000 loops, best of 5: 1.87 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 3), 0)
10000 loops, best of 5: 72.6 μs per loop
>>> %timeit -r5 deque(windowkindall(range(1000), 30), 0)
1000 loops, best of 5: 71.6 μs per loop

Same as previous adapted kindall solution, but with each yield win changed to yield tuple(win) so storing results from the generator works without all stored results really being a view of the most recent result (all other reasonable solutions are safe in this scenario), and adding tuple=tuple to the function definition to move use of tuple from the B in LEGB to the L:

>>> %timeit -r5 deque(windowkindalltupled(range(10), 3), 0)
100000 loops, best of 5: 3.05 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 3), 0)
10000 loops, best of 5: 207 μs per loop
>>> %timeit -r5 deque(windowkindalltupled(range(1000), 30), 0)
1000 loops, best of 5: 348 μs per loop

consume-based solution shown above:

>>> %timeit -r5 deque(windowconsume(range(10), 3), 0)
100000 loops, best of 5: 3.92 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 3), 0)
10000 loops, best of 5: 42.8 μs per loop
>>> %timeit -r5 deque(windowconsume(range(1000), 30), 0)
1000 loops, best of 5: 232 μs per loop

Same as consume, but inlining else case of consume to avoid function call and n is None test to reduce runtime, particularly for small inputs where the setup overhead is a meaningful part of the work:

>>> %timeit -r5 deque(windowinlineconsume(range(10), 3), 0)
100000 loops, best of 5: 3.57 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 3), 0)
10000 loops, best of 5: 40.9 μs per loop
>>> %timeit -r5 deque(windowinlineconsume(range(1000), 30), 0)
1000 loops, best of 5: 211 μs per loop

(Side-note: A variant on pairwise that uses tee with the default argument of 2 repeatedly to make nested tee objects, so any given iterator is only advanced once, not independently consumed an increasing number of times, similar to MrDrFenner's answer is similar to non-inlined consume and slower than the inlined consume on all tests, so I've omitted it those results for brevity).

As you can see, if you don't care about the possibility of the caller needing to store results, my optimized version of kindall's solution wins most of the time, except in the "large iterable, small window size case" (where inlined consume wins); it degrades quickly as the iterable size increases, while not degrading at all as the window size increases (every other solution degrades more slowly for iterable size increases, but also degrades for window size increases). It can even be adapted for the "need tuples" case by wrapping in map(tuple, ...), which runs ever so slightly slower than putting the tupling in the function, but it's trivial (takes 1-5% longer) and lets you keep the flexibility of running faster when you can tolerate repeatedly returning the same value.

If you need safety against returns being stored, inlined consume wins on all but the smallest input sizes (with non-inlined consume being slightly slower but scaling similarly). The deque & tupling based solution wins only for the smallest inputs, due to smaller setup costs, and the gain is small; it degrades badly as the iterable gets longer.

For the record, the adapted version of kindall's solution that yields tuples I used was:

def windowkindalltupled(iterable, n=2, tuple=tuple):
    it = iter(iterable)
    win = deque(islice(it, n), n)
    if len(win) < n:
        return
    append = win.append
    yield tuple(win)
    for e in it:
        append(e)
        yield tuple(win)

Drop the caching of tuple in the function definition line and the use of tuple in each yield to get the faster but less safe version.

Community
  • 1
  • 1
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Obviously, this is less efficient than it could be; `consume` is general purpose (including the ability to do a complete `consume`) and thus needs an extra import and a per-use test for `n is None`. In real code, if and only if I'd determined performance was a problem, or I really needed more concise code, I'd consider inlining the `else` case of `consume` into `window`, assuming I wasn't using `consume` for anything else. But if performance hasn't been shown to be an issue, I'd keep the separate definitions; the named `consume` function makes the operation less magical/self-documenting. – ShadowRanger Dec 02 '16 at 18:10
7

I use the following code as a simple sliding window that uses generators to drastically increase readability. Its speed has so far been sufficient for use in bioinformatics sequence analysis in my experience.

I include it here because I didn't see this method used yet. Again, I make no claims about its compared performance.

def slidingWindow(sequence,winSize,step=1):
"""Returns a generator that will iterate through
the defined chunks of input sequence. Input sequence
must be sliceable."""

    # Verify the inputs
    if not ((type(winSize) == type(0)) and (type(step) == type(0))):
        raise Exception("**ERROR** type(winSize) and type(step) must be int.")
    if step > winSize:
        raise Exception("**ERROR** step must not be larger than winSize.")
    if winSize > len(sequence):
        raise Exception("**ERROR** winSize must not be larger than sequence length.")

    # Pre-compute number of chunks to emit
    numOfChunks = ((len(sequence)-winSize)/step)+1

    # Do the work
    for i in range(0,numOfChunks*step,step):
        yield sequence[i:i+winSize]
Gus
  • 691
  • 1
  • 8
  • 12
  • 4
    The main drawback here is the `len(sequence)` call. This won't work if `sequence` is an iterator or generator. When the input does fit in memory, this does offer a more readable solution than with iterators. – David B. Mar 26 '12 at 19:23
  • Yes, you're right. This particular case was originally meant for scanning DNA sequences which are usually represented as strings. It certainly DOES have the limitation you mention. If you wanted you could simply test each slice to make sure its still the right length and then forget about having to know the length of the whole sequence. But it would add a bit more overhead (a len() test every iteration). – Gus Mar 26 '12 at 20:19
5

a slightly modified version of the deque window, to make it a true rolling window. So that it starts being populated with just one element, then grows to it's maximum window size, and then shrinks as it's left edge comes near the end:

from collections import deque
def window(seq, n=2):
    it = iter(seq)
    win = deque((next(it, None) for _ in xrange(1)), maxlen=n)
    yield win
    append = win.append
    for e in it:
        append(e)
        yield win
    for _ in xrange(len(win)-1):
        win.popleft()
        yield win

for wnd in window(range(5), n=3):
    print(list(wnd))

this gives

[0]
[0, 1]
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4]
[4]
Dmitry Avtonomov
  • 8,747
  • 4
  • 32
  • 45
4

why not

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

It is documented in Python doc . You can easily extend it to wider window.

WeiChing 林煒清
  • 4,452
  • 3
  • 30
  • 65
4

Let's make it lazy!

from itertools import islice, tee

def window(iterable, size): 
    iterators = tee(iterable, size) 
    iterators = [islice(iterator, i, None) for i, iterator in enumerate(iterators)]  
    yield from zip(*iterators)

list(window(range(5), 3))
# [(0, 1, 2), (1, 2, 3), (2, 3, 4)]
Gram
  • 341
  • 2
  • 10
3
def rolling_window(list, degree):
    for i in range(len(list)-degree+1):
        yield [list[i+o] for o in range(degree)]

Made this for a rolling average function

yazdmich
  • 31
  • 2
3

I tested a few solutions along with the one I came up with. I found the one I came up with to be the fastest so I thought I would share my python3 implementation.

import itertools
import sys

def windowed(l, stride):
    return zip(*[itertools.islice(l, i, sys.maxsize) for i in range(stride)])
Ryan Codrai
  • 172
  • 8
  • 1
    Looks similar to the first solution from this answer: https://stackoverflow.com/a/11249883/7851470 – Georgy May 23 '20 at 19:56
  • @georgy I think I skipped over that answer because it was written in Python2 but I agree, it's essentially the same! – Ryan Codrai May 24 '20 at 20:11
2

Multiple iterators!

def window(seq, size, step=1):
    # initialize iterators
    iters = [iter(seq) for i in range(size)]
    # stagger iterators (without yielding)
    [next(iters[i]) for j in range(size) for i in range(-1, -j-1, -1)]
    while(True):
        yield [next(i) for i in iters]
        # next line does nothing for step = 1 (skips iterations for step > 1)
        [next(i) for i in iters for j in range(step-1)]

next(it) raises StopIteration when the sequence is finished, and for some cool reason that's beyond me, the yield statement here excepts it and the function returns, ignoring the leftover values that don't form a full window.

Anyway, this is the least-lines solution yet whose only requirement is that seq implement either __iter__ or __getitem__ and doesn't rely on itertools or collections besides @dansalmo's solution :)

Community
  • 1
  • 1
jameh
  • 1,149
  • 3
  • 12
  • 20
  • note: the stagger step is O(n^2) where n is the size of the window, and only happens on the first call. It could be optimized down to O(n), but it would make the code a little messier :P – jameh Oct 28 '13 at 05:46
2
#Importing the numpy library
import numpy as np
arr = np.arange(6) #Sequence
window_size = 3
np.lib.stride_tricks.as_strided(arr, shape= (len(arr) - window_size +1, window_size), 
strides = arr.strides*2)

"""Example output:

  [0, 1, 2]
  [1, 2, 3]
  [2, 3, 4]
  [3, 4, 5]

"""

FAYAZ
  • 21
  • 3
1

The toolz/cytoolz package has a sliding_window function.

>>> from cytoolz import sliding_window
>>> list(sliding_window(3, range(6))) # returns [(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5)]
W.P. McNeill
  • 16,336
  • 12
  • 75
  • 111
1

In Python 3.10, we have the itertools.pairwise(iterable) function to slide a window with two elements:

Here's the doc :

Return successive overlapping pairs taken from the input iterable.

The number of 2-tuples in the output iterator will be one fewer than the number of inputs. It will be empty if the input iterable has fewer than two values.

Roughly equivalent to:

def pairwise(iterable):
    # pairwise('ABCDEFG') --> AB BC CD DE EF FG
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)
Jw C
  • 171
  • 2
  • 6
  • This was covered by existing answers (e.g. [WeiChing's](https://stackoverflow.com/a/27665967)), and also this special case is addressed in more detail at [How can I iterate over overlapping (current, next) pairs of values from a list?](https://stackoverflow.com/questions/5434891/). The code shown here doesn't meet the complete specification, since it is limited to a fixed window size of 2. – Karl Knechtel Jan 09 '23 at 02:52
0
>>> n, m = 6, 3
>>> k = n - m+1
>>> print ('{}\n'*(k)).format(*[range(i, i+m) for i in xrange(k)])
[0, 1, 2]
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
dansalmo
  • 11,506
  • 5
  • 58
  • 53
0

How about using the following:

mylist = [1, 2, 3, 4, 5, 6, 7]

def sliding_window(l, window_size=2):
    if window_size > len(l):
        raise ValueError("Window size must be smaller or equal to the number of elements in the list.")

    t = []
    for i in xrange(0, window_size):
        t.append(l[i:])

    return zip(*t)

print sliding_window(mylist, 3)

Output:

[(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7)]
keocra
  • 613
  • 5
  • 10
  • @ keocra what does zip(*t) mean? Where can I find some documentation on that kind of statement? – Shejo284 Apr 26 '17 at 07:23
  • 1
    Python 2.7: https://docs.python.org/2/library/functions.html#zip, the star unpacks the list and provide the individual elements as input to zip ([unpacking arguments](https://docs.python.org/3/tutorial/controlflow.html#unpacking-argument-lists)) – keocra Apr 26 '17 at 07:30
0

This is an old question but for those still interested there is a great implementation of a window slider using generators in this page (by Adrian Rosebrock).

It is an implementation for OpenCV however you can easily use it for any other purpose. For the eager ones i'll paste the code here but to understand it better I recommend visiting the original page.

def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])

Tip: You can check the .shape of the window when iterating the generator to discard those that do not meet your requirements

Cheers

DarkCygnus
  • 7,420
  • 4
  • 36
  • 59
0

Modified DiPaolo's answer to allow arbitrary fill and variable step size

import itertools
def window(seq, n=2,step=1,fill=None,keep=0):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(itertools.islice(it, n))    
    if len(result) == n:
        yield result
    while True:        
#         for elem in it:        
        elem = tuple( next(it, fill) for _ in range(step))
        result = result[step:] + elem        
        if elem[-1] is fill:
            if keep:
                yield result
            break
        yield result
shouldsee
  • 434
  • 7
  • 7
0

here is a one liner. I timed it and it's comprable to the performance of the top answer and gets progressively better with larger seq from 20% slower with len(seq) = 20 and 7% slower with len(seq) = 10000

zip(*[seq[i:(len(seq) - n - 1 + i)] for i in range(n)])
kkawabat
  • 1,530
  • 1
  • 14
  • 37
0

Trying my part, simple, one liner, pythonic way using islice. But, may not be optimally efficient.

from itertools import islice
array = range(0, 10)
window_size = 4
map(lambda i: list(islice(array, i, i + window_size)), range(0, len(array) - window_size + 1))
# output = [[0, 1, 2, 3], [1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

Explanation: Create window by using islice of window_size and iterate this operation using map over all array.

Paras Mishra
  • 336
  • 1
  • 9
0

Optimized Function for sliding window data in Deep learning

def SlidingWindow(X, window_length, stride):
    indexer = np.arange(window_length)[None, :] + stride*np.arange(int(len(X)/stride)-window_length+4)[:, None]
    return X.take(indexer)

to apply on multidimensional array

import numpy as np
def SlidingWindow(X, window_length, stride1):
    stride=  X.shape[1]*stride1
    window_length = window_length*X.shape[1]
    indexer = np.arange(window_length)[None, :] + stride1*np.arange(int(len(X)/stride1)-window_length-1)[:, None]
    return X.take(indexer)
Naga kiran
  • 4,528
  • 1
  • 17
  • 31
0

my two versions of window implementation

from typing import Sized, Iterable

def window(seq: Sized, n: int, strid: int = 1, drop_last: bool = False):
    for i in range(0, len(seq), strid):
        res = seq[i:i + n]
        if drop_last and len(res) < n:
            break
        yield res


def window2(seq: Iterable, n: int, strid: int = 1, drop_last: bool = False):
    it = iter(seq)
    result = []
    step = 0
    for i, ele in enumerate(it):
        result.append(ele)
        result = result[-n:]
        if len(result) == n:
            if step % strid == 0:
                yield result
            step += 1
    if not drop_last:
        yield result

Sailist
  • 134
  • 8
0

Another simple way to generate window of fixed length from a list

from collections import deque

def window(ls,window_size=3):
    window = deque(maxlen=window_size)

    for element in ls:
        
        if len(window)==window_size:
            yield list(window)
        window.append(element)

ls = [0,1,2,3,4,5]

for w in window(ls):
    print(w)
0

My (keep it simple) solution that I ended up using:

def sliding_window(items, size):
    return [items[start:end] for start, end
            in zip(range(0, len(items) - size + 1), range(size, len(items) + 1))]

Needless to say, the items sequence needs to be sliceable. Working with indices is not ideal, but it seems to be the least bad option given the alternatives... This can also easily be changed to a generator: just replace [...] with (...).

campkeith
  • 612
  • 8
  • 9
0

I find this solution more elegant than using built-in functions.

words = ["this", "is", "an", "example"]

def get_sliding_windows(doc, sliding_window, padded=False):
    all_windows = []
    for i in range(sliding_window):
        front = sliding_window-i
        all_windows.append(front*['']+doc+i*[''])
    if padded:
        return np.array(all_windows).transpose()[1:]
    else:
        return np.array(all_windows).transpose()[sliding_window:-1]

>>> get_sliding_windows(words,3)
>>> array([['this', 'is', 'an'],
       ['is', 'an', 'example'],
       ['an', 'example', '']], dtype='<U7')

>>> get_padded_sliding_windows(words,3, True)
>>> array([['', '', 'this'],
       ['', 'this', 'is'],
       ['this', 'is', 'an'],
       ['is', 'an', 'example'],
       ['an', 'example', ''],
       ['example', '', '']], dtype='<U7')
Emil
  • 1,531
  • 3
  • 22
  • 47
-1

Update

This is a duplicated answer here, as Kelly find out. But I am leaving this here as a counter-example since I include a pointless min.

So if you feel tempted to use min to avoid IndexError don't, there is no need, range will handle that case for you.


Old answer

Curiously the following handle automatically when n > len(l) returning [] which is semantically correct.

>>> l = [0, 1, 2, 3, 4]

>>> n = 2
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1], [1, 2], [2, 3], [3, 4]]
>>>
>>> n = 3
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2], [1, 2, 3], [2, 3, 4]]
>>>
>>> n = 4
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3], [1, 2, 3, 4]]
>>>
>>> n = 5
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> [[0, 1, 2, 3, 4]]
>>>
>>> n = 10 # n > len(l)
>>> [l[i: i + min(n, len(l)-i)] for i in range(len(l)-n+1)]
>>> []
Raydel Miranda
  • 13,825
  • 3
  • 38
  • 60
  • [This](https://stackoverflow.com/a/38626832/12671057) is a list comprehension posted years before yours, identical to yours except without your pointless `min` part. – Kelly Bundy Dec 21 '22 at 17:45
  • You wouldn't get an `IndexError` anyway, for example `l[3:666]` gives you `[3, 4]` without error. – Kelly Bundy Dec 22 '22 at 12:15