40

I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.

In my mind it is something like:

for list_of_x_items in fatherList:
    foo(list_of_x_items)

Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?


edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:

The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:

edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?

for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.


Related:

Community
  • 1
  • 1
Rince
  • 2,195
  • 3
  • 21
  • 35
  • 1
    You seem to be describing an operation that works on subsets of the original list. "slices" can refer to the individual contiguous ranges, but the operation you're looking for is often called "partitioning". You might find this helpful: http://stackoverflow.com/questions/1198512/split-a-list-into-parts-based-on-a-set-of-indexes-in-python – jmanning2k Aug 26 '09 at 15:17
  • I was actually thinking of what is sometimes refered to as a "sliding window" of width n over a sequence, but looking at Nadia's answer there is this other interpretation of the question, which is probably closer to "partitioning". Maybe the OP will clarify this. – ThomasH Aug 26 '09 at 15:54

10 Answers10

74

If you want to divide a list into slices you can use this trick:

list_of_slices = zip(*(iter(the_list),) * slice_size)

For example

>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:

>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

It is a dirty little trick


OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.

First a little background:

In Python you can multiply a list by a number like this:

[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])

And an iterator object can be consumed once like this:

>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3

The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:

zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]

The * in front of zip used to unpack arguments. You can find more details here. So

zip(*[(1, 20), (2, 30), (3, 40)])

is actually equivalent to

zip((1, 20), (2, 30), (3, 40))

but works with a variable number of arguments

Now back to the trick:

list_of_slices = zip(*(iter(the_list),) * slice_size)

iter(the_list) -> convert the list into an iterator

(iter(the_list),) * N -> will generate an N reference to the_list iterator.

zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator

I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.

Nadia Alramli
  • 111,714
  • 37
  • 173
  • 152
  • 4
    One interesting thing is also that you're silently making use of the ``*`` unpacking operator's low precedence, which allows funny constructs like `` f(*L*N)``, and which I find nowhere documented. – ThomasH Apr 12 '11 at 21:16
  • 2
    @ThomasH: `f(*args)` is an extended call syntax in Python. You can put anything that evaluates to an iterable on the right side of asterisk (`*`) e.g., [`f(*[1]+g())`](http://ideone.com/2yP70) – jfs Feb 16 '12 at 23:45
  • @J.F.Sebastian Yes, thanks. What struck me was just the operator precedence for this. Your example resolves to `f(*([1]+g()))`. – ThomasH Feb 18 '12 at 14:01
22

If you want to be able to consume any iterable you can use these functions:

from itertools import chain, islice

def ichunked(seq, chunksize):
    """Yields items from an iterator in iterable chunks."""
    it = iter(seq)
    while True:
        yield chain([it.next()], islice(it, chunksize-1))

def chunked(seq, chunksize):
    """Yields items from an iterator in list chunks."""
    for chunk in ichunked(seq, chunksize):
        yield list(chunk)
Ants Aasma
  • 53,288
  • 15
  • 90
  • 97
  • 1
    Very beautiful. And you can actually make it smarter by adding a parameter to choose the type of each chunk: https://gist.github.com/1275417. – Bite code Oct 10 '11 at 14:14
  • In Python 3 it is `next(it)` instead of `it.next()`. For those wondering: `islice` doesn't raise a StopIteration, that's why we call `next`. – Tim Mar 15 '18 at 13:20
  • @Tim-Erwin I tried it in python 3.6 by changing to `next(it)` however I get the warning `DeprecationWarning: generator 'ichunked' raised StopIteration`. I don't understand where this is being raised, as it's not explicit. – davidA Mar 20 '18 at 22:26
  • 1
    @meowsqueak It is raised in `next()`. That's expected in Python 3.5 and below, deprecated in 3.6, broken in 3.7. I added a version compatible with 3.7+ in the answer. – Tim Mar 21 '18 at 09:17
12

Use a generator:

big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
    for i in range(len(lst) - sliceLen + 1):
        yield lst[i:i + sliceLen]

for slice in sliceIterator(big_list, slice_length):
    foo(slice)

sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.

ThomasH
  • 22,276
  • 13
  • 61
  • 62
9

Do you mean something like:

def callonslices(size, fatherList, foo):
  for i in xrange(0, len(fatherList), size):
    foo(fatherList[i:i+size])

If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:

def sliceup(size, fatherList):
  for i in xrange(0, len(fatherList), size):
    yield fatherList[i:i+size]

and then:

def callonslices(size, fatherList, foo):
  for sli in sliceup(size, fatherList):
    foo(sli)
Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • I think you'll get lists of irregular size with that. No telling exactly what the OP wanted, but this might be closer: for i in xrange(0, len(fatherList)-size, 1): – hughdbrown Aug 26 '09 at 15:20
  • Correction: for i in xrange(len(fatherList) - size + 1): – hughdbrown Aug 26 '09 at 15:22
  • "irregular size"? they're all of len `size` except possibly for the last one -- and nonoverlapping. Consider what happens with your code when the len is 6 and the size is 7, say... 5 slices of decreasing length... – Alex Martelli Aug 26 '09 at 15:25
  • Better;-) but with len 6 and size 8 you still have the same problem. – Alex Martelli Aug 26 '09 at 15:25
  • I was searing for something nice that would work out of the box as it looks as TOTALY general problem. My note: Nice&simple implementation but not to use with iterators... :( No len(list(fatherList)) is not it. – lukmdo Oct 24 '11 at 14:29
8

Answer to the last part of the question:

question update: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?

If you need to store state then you can use an object for that.

class Chunker(object):
    """Split `iterable` on evenly sized chunks.

    Leftovers are remembered and yielded at the next call.
    """
    def __init__(self, chunksize):
        assert chunksize > 0
        self.chunksize = chunksize        
        self.chunk = []

    def __call__(self, iterable):
        """Yield items from `iterable` `self.chunksize` at the time."""
        assert len(self.chunk) < self.chunksize
        for item in iterable:
            self.chunk.append(item)
            if len(self.chunk) == self.chunksize:
                # yield collected full chunk
                yield self.chunk
                self.chunk = [] 

Example:

chunker = Chunker(3)
for s in "abcd", "efgh":
    for chunk in chunker(s):
        print ''.join(chunk)

if chunker.chunk: # is there anything left?
    print ''.join(chunker.chunk)

Output:

abc
def
gh
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • There needs to be another yield self.chunk after the for loop to account for chunks that aren't the full chunksize. Otherwise this won't work on arbitrary list sizes. – aryeh Sep 11 '11 at 15:47
  • @aryeh: `__call__` may be called more than once, therefore you can't just yield leftovers after the for loop. That's the point of the class. It answers *"to store the extra items and use them when the next fatherList is fed to the function"* part of the question as explicitly stated at the top of the answer. – jfs Sep 11 '11 at 16:47
  • 1
    My mistake, I didn't read the question/example carefully enough. I thought it was just for grouping. I personally dislike the extra check at the end and would vote to add an option to __call__ whether to save extra items or flush them at the end, so you wouldn't need to possibly duplicate the body of the for loop in a following if statement. – aryeh Sep 16 '11 at 22:19
6

I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).

>>> x = numpy.array(range(20))
>>> x
    array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])    
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([  0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,   8.5,
         9.5,  10.5,  11.5,  12.5,  13.5,  14.5,  15.5,  16.5,  17.5,  18.5])

The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).

You can find a complete material here: http://www.scipy.org/Cookbook/SignalSmooth

LeMiz
  • 5,554
  • 5
  • 28
  • 23
3

Expanding on the answer of @Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:

from itertools import chain, islice

def ichunked(seq, chunksize):
    it = iter(seq)
    while True:
        try:
            yield chain([next(it)], islice(it, chunksize - 1))
        except StopIteration:
            return
Tim
  • 1,315
  • 1
  • 14
  • 35
2

Your question could use some more detail, but how about:

def iterate_over_slices(the_list, slice_size):
    for start in range(0, len(the_list)-slice_size):
        slice = the_list[start:start+slice_size]
        foo(slice)
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
2

For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:

>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]

If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:

[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]

(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)

Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.

Dologan
  • 4,554
  • 2
  • 31
  • 33
0

A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:

def slice_iterator(data, slice_len):
    it = iter(data)
    while True:
        items = []
        for index in range(slice_len):
            try:
                item = next(it)
            except StopIteration:
                if items == []:
                    return # we are done
                else:
                    break # exits the "for" loop
            items.append(item)
        yield items

Usage example:

for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
    print(slice)

Result:

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
mit
  • 11,083
  • 11
  • 50
  • 74
  • Does what some of the other solutions do in more lines, maybe a little slower, depending on the use case. This could be improved using islice. – mit Oct 30 '21 at 09:31