15

This is more a question of elegance and performance rather than “how to do at all”, so I'll just show the code:

def iterate_adjacencies(gen, fill=0, size=2, do_fill_left=True,
  do_fill_right=False):
    """ Iterates over a 'window' of `size` adjacent elements in the supploed
    `gen` generator, using `fill` to fill edge if `do_fill_left` is True
    (default), and fill the right edge (i.e.  last element and `size-1` of
    `fill` elements as the last item) if `do_fill_right` is True.  """
    fill_size = size - 1
    prev = [fill] * fill_size
    i = 1
    for item in gen:  # iterate over the supplied `whatever`.
        if not do_fill_left and i < size:
            i += 1
        else:
            yield prev + [item]
        prev = prev[1:] + [item]
    if do_fill_right:
        for i in range(fill_size):
            yield prev + [fill]
            prev = prev[1:] + [fill]

and then ask: is there already a function for that? And, if not, can you do the same thing in a better (i.e. more neat and/or more fast) way?

Edit:

with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:

def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
    """ Returns a sliding window (of width n) over data from the iterable:
      s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
    """
    ssize = size - 1
    it = chain(
      repeat(fill, ssize * fill_left),
      iter(seq),
      repeat(fill, ssize * fill_right))
    result = tuple(islice(it, size))
    if len(result) == size:  # `<=` if okay to return seq if len(seq) < size
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result
HoverHell
  • 4,739
  • 3
  • 21
  • 23
  • 1
    You might want to explain in more detail what this does with words. – agf Aug 09 '11 at 14:59
  • I should note that I didn't use `iter(gen)` (to fill `prev` initially when `do_fill_left` is False) because I'm not completely certain (even though still quite certain) that it will behave in the same way for any possible type of `gen`. Also, it would be possible then to chain `gen` with `[fill]*fill_size if do_fill_right` instead of last 4 lines. – HoverHell Aug 09 '11 at 14:59
  • 2
    Could you provide a sample input and a samplle of the expected output please? it would be easier to understand what your function is doing. – mdeous Aug 09 '11 at 15:03
  • Here is something similar (a sliding window over a iterator) http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python/6822773#6822773 - you can extend it with you fill-options. – Martin Thurau Aug 09 '11 at 15:05
  • If these are numbers (which it seems they are), use numpy? – djhoese Aug 09 '11 at 15:05
  • @agf: the comment of the function does just that, no? – Claudiu Aug 09 '11 at 15:45
  • 2
    It was faster for me to run the code and see than figure it out from the docstring because the wording wasn't that clear, so I suggested he write it out more clearly. – agf Aug 09 '11 at 15:48
  • @agf: how would you word the docstring of such function (with all the arguments)? I am indeed aware that my explanations are usually more like notes to myself rather than clear documentation. – HoverHell Aug 10 '11 at 06:53
  • @daveydave400: possibly interesting, but how to do that in numpy? – HoverHell Aug 10 '11 at 07:06
  • 1
    @HoverHell - Very, very nice. You should post it as an answer and accept it. How would I phrase the docstring differently? I would break it up into several short declarative sentences describing what it does and each argument. – agf Aug 10 '11 at 18:57
  • 1
    @HoverHell: 1) Yes, do post your new code as an answer. 2) [This question and its answer](http://stackoverflow.com/questions/6811183/rolling-window-for-1d-arrays-in-numpy) explain how to create a rolling window in numpy. – senderle Aug 10 '11 at 22:13

5 Answers5

7

This page shows how to implement a sliding window with itertools. http://docs.python.org/release/2.3.5/lib/itertools-example.html

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + (elem,)
        yield result

Example output:

>>> list(window(range(10)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]

You'd need to change it to fill left and right if you need.

FogleBird
  • 74,300
  • 25
  • 125
  • 131
2

This is my version that fills, keeping the signature the same. I have previously seen the itertools recipe, but did not look at it before writing this.

from itertools import chain
from collections import deque

def ia(gen, fill=0, size=2, fill_left=True, fill_right=False):
    gen, ssize = iter(gen), size - 1
    deq = deque(chain([fill] * ssize * fill_left,
                      (next(gen) for _ in xrange((not fill_left) * ssize))),
                maxlen = size)
    for item in chain(gen, [fill] * ssize * fill_right):
        deq.append(item)
        yield deq

Edit: I also didn't see your comments on your question before posting this.

Edit 2: Fixed. I had tried to do it with one chain but this design needs two.

Edit 3: As @senderle noted, only use it this as a generator, don't wrap it with list or accumulate the output, as it yields the same mutable item repeatedly.

agf
  • 171,228
  • 44
  • 289
  • 238
  • Wouldn't be better to return a tuple? The result of `list(ia(range(10), 'a', fill_left=True, fill_right=True))` is a list of references to the same deque, all in the same state. Also, you could use `islice` to construct the deque and then iterate over the remainder of gen, instead of calling chain twice. – senderle Aug 10 '11 at 03:31
  • That's just the solution given in itertools. I wasn't interested in replicating it. – agf Aug 10 '11 at 18:52
  • It is? I don't see that in `itertools` anywhere. Let me rephrase. Repeatedly `yield`ing the same mutable object generates unexpected results. For example, the output of `list(ia(range(2), 'x', 2, True, True))` is `[deque([1, 'x'], maxlen=2), deque([1, 'x'], maxlen=2), deque([1, 'x'], maxlen=2)]`. I think it would make sense to return a copy of the deque's contents instead of just returning the deque. The fastest way to do that is probably with `tuple`. – senderle Aug 10 '11 at 19:02
  • I understand. But if you're going to yield `tuple`s and use `islice` anyway, then the solution given by @HoverHell based on the one in `itertools` with `chain` and `repeat` to fill is better. It's not worth adapting this one to be more like that; better to leave it for when you really want to use it just as a generator. – agf Aug 10 '11 at 19:10
  • Yeah, ok, I see what you're saying now. And indeed, after doing [some timings](http://stackoverflow.com/questions/6998245/iterate-over-a-window-of-adjacent-elements-in-python/7004905#7004905), I have to admit that you are totally correct on all points. – senderle Aug 10 '11 at 22:06
2

Ok, after coming to my senses, here's a non-ridiculous version of window_iter_fill. My previous version (visible in edits) was terrible because I forgot to use izip. Not sure what I was thinking. Using izip, this works, and, in fact, is the fastest option for small inputs!

def window_iter_fill(gen, size=2, fill=None):
    gens = (chain(repeat(fill, size - i - 1), gen, repeat(fill, i))
            for i, gen in enumerate(tee(gen, size)))
    return izip(*gens)

This one is also fine for tuple-yielding, but not quite as fast.

def window_iter_deque(it, size=2, fill=None, fill_left=False, fill_right=False):
    lfill = repeat(fill, size - 1 if fill_left else 0)
    rfill = repeat(fill, size - 1 if fill_right else 0)
    it = chain(lfill, it, rfill)
    d = deque(islice(it, 0, size - 1), maxlen=size)
    for item in it:
        d.append(item)
        yield tuple(d)

HoverHell's newest solution is still the best tuple-yielding solution for high inputs.

Some timings:

Arguments: [xrange(1000), 5, 'x', True, True]

==============================================================================
  window               HoverHell's frankeniter           :  0.2670ms [1.91x]
  window_itertools     from old itertools docs           :  0.2811ms [2.02x]
  window_iter_fill     extended `pairwise` with izip     :  0.1394ms [1.00x]
  window_iter_deque    deque-based, copying              :  0.4910ms [3.52x]
  ia_with_copy         deque-based, copying v2           :  0.4892ms [3.51x]
  ia                   deque-based, no copy              :  0.2224ms [1.60x]
==============================================================================

Scaling behavior:

Arguments: [xrange(10000), 50, 'x', True, True]

==============================================================================
  window               HoverHell's frankeniter           :  9.4897ms [4.61x]
  window_itertools     from old itertools docs           :  9.4406ms [4.59x]
  window_iter_fill     extended `pairwise` with izip     :  11.5223ms [5.60x]
  window_iter_deque    deque-based, copying              :  12.7657ms [6.21x]
  ia_with_copy         deque-based, copying v2           :  13.0213ms [6.33x]
  ia                   deque-based, no copy              :  2.0566ms [1.00x]
==============================================================================

The deque-yielding solution by agf is super fast for large inputs -- seemingly O(n) instead of O(n, m) like the others, where n is the length of the iter and m is the size of the window -- because it doesn't have to iterate over every window. But I still think it makes more sense to yield a tuple in the general case, because the calling function is probably just going to iterate over the deque anyway; it's just a shift of the computational burden. The asymptotic behavior of the larger program should remain the same.

Still, in some special cases, the deque-yielding version will probably be faster.

Some more timings based on HoverHell's test structure.

>>> import testmodule
>>> kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window(**kwa)]
1000 loops, best of 3: 462 us per loop
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.ia(**kwa)]
1000 loops, best of 3: 463 us per loop
>>> %timeit -n 1000 [a + b + c + d for a, b, c, d in testmodule.window_iter_fill(**kwa)]
1000 loops, best of 3: 251 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.window(**kwa)]
1000 loops, best of 3: 525 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.ia(**kwa)]
1000 loops, best of 3: 462 us per loop
>>> %timeit -n 1000 [sum(x) for x in testmodule.window_iter_fill(**kwa)]
1000 loops, best of 3: 333 us per loop

Overall, once you use izip, window_iter_fill is quite fast, as it turns out -- especially for small windows.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • Iterating over a deque is faster than iterating over a tuple, though, so it makes sense to return a deque if you're going to iterate. [Here's a recent answer](http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python/6822761#6822761) in which I timed a few windowing iterators and considered the deque vs. tuple/list question. – kindall Aug 10 '11 at 22:23
  • Funny that window_itertools is a little bit slower. Also, deque is faster if you're going to iterate, but tuple seems faster if you're unpacking. – HoverHell Aug 11 '11 at 07:32
  • Er, actually tuple just *seemed* faster when unpacking - deque is approximately as fast in such case. See my answer for some timings. – HoverHell Aug 11 '11 at 07:38
  • @kindall, moreover, you'll probably iterate over the values twice if you return a tuple. But I wasn't being clear -- my point was just that although the asymptotic behavior of `ia` _looks_ better in the above timings (only ~10x instead of ~50x slower like all the rest) the asymptotic behavior of a program using it is likely to be no better. So you're really just getting a speedup by a constant factor. I don't think that's an unacceptable price to pay for a more general-use iterator. Certainly for special purposes, though, returning a deque is a good strategy. – senderle Aug 11 '11 at 22:31
  • @kindall, @HoverHell, it seems I was being stupid! I forgot to use `izip` -- in fact, with `izip`, yielding tuples is automatic, and produces the fastest result (on my computer) for small inputs. – senderle Aug 13 '11 at 19:51
1

Resulting function (from the edit of the question),

frankeniter with ideas from answers of @agf, @FogleBird, @senderle, a resulting somewhat-neat-looking piece of code is:

from itertools import chain, repeat, islice

def window(seq, size=2, fill=0, fill_left=True, fill_right=False):
    """ Returns a sliding window (of width n) over data from the iterable:
      s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...
    """
    ssize = size - 1
    it = chain(
      repeat(fill, ssize * fill_left),
      iter(seq),
      repeat(fill, ssize * fill_right))
    result = tuple(islice(it, size))
    if len(result) == size:  # `<=` if okay to return seq if len(seq) < size
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

and, for some performance information regarding deque/tuple:

In [32]: kwa = dict(gen=xrange(1000), size=4, fill=-1, fill_left=True, fill_right=True)
In [33]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.ia(**kwa)]
10000 loops, best of 3: 358 us per loop
In [34]: %timeit -n 10000 [a+b+c+d for a,b,c,d in tmpf5.window(**kwa)]
10000 loops, best of 3: 368 us per loop
In [36]: %timeit -n 10000 [sum(x) for x in tmpf5.ia(**kwa)]
10000 loops, best of 3: 340 us per loop
In [37]: %timeit -n 10000 [sum(x) for x in tmpf5.window(**kwa)]
10000 loops, best of 3: 432 us per loop

but anyway, if it's numbers then numpy is likely preferable.

HoverHell
  • 4,739
  • 3
  • 21
  • 23
0

I'm surprised nobody took a simple coroutine approach.

from collections import deque


def window(n, initial_data=None):
    if initial_data:
        win = deque(initial_data, n)
    else:
        win = deque(((yield) for _ in range(n)), n)
    while 1:
        side, val = (yield win)
        if side == 'left':
            win.appendleft(val)
        else:
            win.append(val)

win = window(4)
win.next()

print(win.send(('left', 1)))
print(win.send(('left', 2)))
print(win.send(('left', 3)))
print(win.send(('left', 4)))
print(win.send(('right', 5)))

## -- Results of print statements --
deque([1, None, None, None], maxlen=4)
deque([2, 1, None, None], maxlen=4)
deque([3, 2, 1, None], maxlen=4)
deque([4, 3, 2, 1], maxlen=4)
deque([3, 2, 1, 5], maxlen=4)
Alex Gaudio
  • 1,894
  • 1
  • 16
  • 13