27

How do I process the elements of a sequence in batches, idiomatically?

For example, with the sequence "abcdef" and a batch size of 2, I would like to do something like the following:

for x, y in "abcdef":
    print "%s%s\n" % (x, y)
ab
cd
ef

Of course, this doesn't work because it is expecting a single element from the list which itself contains 2 elements.

What is a nice, short, clean, pythonic way to process the next n elements of a list in a batch, or sub-strings of length n from a larger string (two similar problems)?

SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • 1
    Related: http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks – jfs Feb 27 '10 at 18:59

17 Answers17

47

A generator function would be neat:

def batch_gen(data, batch_size):
    for i in range(0, len(data), batch_size):
            yield data[i:i+batch_size]

Example use:

a = "abcdef"
for i in batch_gen(a, 2): print i

prints:

ab
cd
ef
rpr
  • 3,768
  • 2
  • 22
  • 20
15

I've got an alternative approach, that works for iterables that don't have a known length.

   
def groupsgen(seq, size):
    it = iter(seq)
    while True:
        values = ()        
        for n in xrange(size):
            values += (it.next(),)        
        yield values    

It works by iterating over the sequence (or other iterator) in groups of size, collecting the values in a tuple. At the end of each group, it yield the tuple.

When the iterator runs out of values, it produces a StopIteration exception which is then propagated up, indicating that groupsgen is out of values.

It assumes that the values come in sets of size (sets of 2, 3, etc). If not, any values left over are just discarded.

Silverfish
  • 1,793
  • 2
  • 12
  • 10
  • Nice! Very generally. Very pythonic. And a useful example for wrapping a generator. – hobs Sep 20 '13 at 21:57
  • 1
    For large group sizes, `values` should be a list rather than a tuple. Tuples are immutable, so `+=` requires the reallocation of new memory of length `n` for a tupe with each iteration. Much faster to append a list (which is mutable) rather than a tuple. – hobs Sep 20 '13 at 22:25
12

Don't forget about the zip() function:

a = 'abcdef'
for x,y in zip(a[::2], a[1::2]):
  print '%s%s' % (x,y)
Jason Coon
  • 17,601
  • 10
  • 42
  • 50
10

I am sure someone is going to come up with some more "Pythonic" but how about:

for y in range(0, len(x), 2):
    print "%s%s" % (x[y], x[y+1])

Note that this would only work if you know that len(x) % 2 == 0;

Paolo Bergantino
  • 480,997
  • 81
  • 517
  • 436
  • start the range at 1 and then using x[y-1] will work for len(x)%2 == 1 – Jason Coon Apr 17 '09 at 15:25
  • 1
    This answer seems simplest to me, accepted! -- with this slight modification which makes it shorter when handling batches > 2: for i in range(0, len(s), 2): print s[i:i+2] –  Apr 17 '09 at 15:57
  • though this answer is neither quite pythonic nor generic – rpr Apr 18 '09 at 09:41
  • It solved the OP's problem in a short and simple way. Your answer may be the most pythonic (and I even noted in my answer that it isn't pythonic) but that's hardly a reason for a downvote... – Paolo Bergantino Apr 18 '09 at 09:49
6

but the more general way would be (inspired by this answer):

for i in zip(*(seq[i::size] for i in range(size))):
    print(i)                            # tuple of individual values
Community
  • 1
  • 1
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
  • 1
    Note to new viewers: this now has the correct number of ')'s, but also note that this doesn't work when len(seq) % size != 0 – Nate Parsons Feb 07 '11 at 07:54
6

And then there's always the documentation.

def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    try:
        b.next()
    except StopIteration:
        pass
    return izip(a, b)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

Note: these produce tuples instead of substrings, when given a string sequence as input.

tzot
  • 92,761
  • 29
  • 141
  • 204
4
>>> a = "abcdef"
>>> size = 2
>>> [a[x:x+size] for x in range(0, len(a), size)]
['ab', 'cd', 'ef']

..or, not as a list comprehension:

a = "abcdef"
size = 2
output = []
for x in range(0, len(a), size):
    output.append(a[x:x+size])

Or, as a generator, which would be best if used multiple times (for a one-use thing, the list comprehension is probably "best"):

def chunker(thelist, segsize):
    for x in range(0, len(thelist), segsize):
            yield thelist[x:x+segsize]

..and it's usage:

>>> for seg in chunker(a, 2):
...     print seg
... 
ab
cd
ef
dbr
  • 165,801
  • 69
  • 278
  • 343
3

you can create the following generator

def chunks(seq, size):
    a = range(0, len(seq), size)
    b = range(size, len(seq) + 1, size)
    for i, j in zip(a, b):
        yield seq[i:j]

and use it like this:

for i in chunks('abcdef', 2):
    print(i)
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
2

From the docs of more_itertools: more_itertools.chunked()

more_itertools.chunked(iterable, n)

Break an iterable into lists of a given length:

>>> list(chunked([1, 2, 3, 4, 5, 6, 7], 3))
[[1, 2, 3], [4, 5, 6], [7]]

If the length of iterable is not evenly divisible by n, the last returned list will be shorter.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
Gregor Melhorn
  • 469
  • 1
  • 5
  • 12
1

s = 'abcdefgh'
for e in (s[i:i+2] for i in range(0,len(s),2)):
  print(e)
1

The itertools doc has a recipe for this:

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

Usage:

>>> l = [1,2,3,4,5,6,7,8,9]
>>> [z for z in grouper(l, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
dano
  • 91,354
  • 19
  • 222
  • 219
1

Except for two answers I saw a lot of premature materialization on the batches, and subscripting (which does not work for all iterators). Hence I came up with this alternative:

def iter_x_and_n(iterable, x, n):
    yield x
    try:
        for _ in range(n):
            yield next(iterable)
    except StopIteration:
        pass

def batched(iterable, n):
    if n<1: raise ValueError("Can not create batches of size %d, number must be strictly positive" % n)
    iterable = iter(iterable)
    try:
        for x in iterable:
            yield iter_x_and_n(iterable, x, n-1)
    except StopIteration:
        pass

It beats me that there is no one-liner or few-liner solution for this (to the best of my knowleged). The key issue is that both the outer generator and the inner generator need to handle the StopIteration correctly. The outer generator should only yield something if there is at least one element left. The intuitive way to check this, is to execute a next(...) and catch a StopIteration.

Herbert
  • 5,279
  • 5
  • 44
  • 69
1

Adapted from this answer for Python 3:

def groupsgen(seq, size):
    it = iter(seq)
    iterating = True
    while iterating:
        values = ()
        try:
            for n in range(size):
                values += (next(it),)
        except StopIteration:
            iterating = False
            if not len(values):
                return None
        yield values

It will safely terminate and won't discard values if their number is not divisible by size.

0

Given

from __future__ import print_function                      # python 2.x

seq = "abcdef"
n = 2

Code

while seq:
    print("{}".format(seq[:n]), end="\n")
    seq = seq[n:]

Output

ab
cd
ef
pylang
  • 40,867
  • 14
  • 129
  • 121
0

Here is a solution, which yields a series of iterators, each of which iterates over n items.

def groupiter(thing, n):
    def countiter(nextthing, thingiter, n):
        yield nextthing
        for _ in range(n - 1):
            try:
                nextitem = next(thingiter)
            except StopIteration:
                return
            yield nextitem
    thingiter = iter(thing)
    while True:
        try:
            nextthing = next(thingiter)
        except StopIteration:
            return
        yield countiter(nextthing, thingiter, n)

I use it as follows:

table = list(range(250))
for group in groupiter(table, 16):
    print(' '.join('0x{:02X},'.format(x) for x in group))

Note that it can handle the length of the object not being a multiple of n.

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
0

How about itertools?

from itertools import islice, groupby

def chunks_islice(seq, size):
    while True:
        aux = list(islice(seq, 0, size))
        if not aux: break
        yield "".join(aux)

def chunks_groupby(seq, size):
    for k, chunk in groupby(enumerate(seq), lambda x: x[0] / size):
        yield "".join([i[1] for i in chunk])
dbr
  • 165,801
  • 69
  • 278
  • 343
  • To make code-blocks, you indent the code by four spaces (instead of using
     tags), the "101010" button in the editor toolbar does this for the selected text too
    – dbr Apr 23 '09 at 15:18
-1

One solution, although I challenge someone to do better ;-)

a = 'abcdef'
b = [[a[i-1], a[i]] for i in range(1, len(a), 2)]

for x, y in b:
  print "%s%s\n" % (x, y)
Jason Coon
  • 17,601
  • 10
  • 42
  • 50