140

I am surprised I could not find a "batch" function that would take as input an iterable and return an iterable of iterables.

For example:

for i in batch(range(0,10), 1): print i
[0]
[1]
...
[9]

or:

for i in batch(range(0,10), 3): print i
[0,1,2]
[3,4,5]
[6,7,8]
[9]

Now, I wrote what I thought was a pretty simple generator:

def batch(iterable, n = 1):
   current_batch = []
   for item in iterable:
       current_batch.append(item)
       if len(current_batch) == n:
           yield current_batch
           current_batch = []
   if current_batch:
       yield current_batch

But the above does not give me what I would have expected:

for x in   batch(range(0,10),3): print x
[0]
[0, 1]
[0, 1, 2]
[3]
[3, 4]
[3, 4, 5]
[6]
[6, 7]
[6, 7, 8]
[9]

So, I have missed something and this probably shows my complete lack of understanding of python generators. Anyone would care to point me in the right direction ?

[Edit: I eventually realized that the above behavior happens only when I run this within ipython rather than python itself]

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
mathieu
  • 2,954
  • 2
  • 20
  • 31
  • Good question, well written, but it already exists and will solve your problem. – Josh Smeaton Nov 28 '11 at 01:07
  • 10
    IMO this isn't really a duplicate. The other question focuses on lists instead of iterators, and most of those answers require len() which is undesirable for iterators. But eh, the currently accepted answer here also requires len(), so... – dequis Dec 13 '16 at 14:47
  • 9
    This is clearly not a duplicate. The other Q&A _only works for lists_, and this question is about generalizing to all iterables, which is exactly the question I had in mind when I came here. – Mark E. Haase Mar 16 '17 at 16:02
  • 2
    @JoshSmeaton @casperOne this is not a duplicate and the accepted answer is not correct. The linked duplicate question is for list and this is for iterable. list provides len() method but iterable does not provide a len() method and the answer would be different without using len() This is the correct answer: `batch = (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *[iter(iterable)] * n))` – Trideep Rath Jan 28 '19 at 20:14
  • @TrideepRath yep, I've voted to reopen. – Josh Smeaton Jan 29 '19 at 23:53

20 Answers20

190

This is probably more efficient (faster)

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

for x in batch(range(0, 10), 3):
    print x

Example using list

data = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # list of data 

for x in batch(data, 3):
    print(x)

# Output

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9, 10]

It avoids building new lists.

Kmaschta
  • 2,369
  • 1
  • 18
  • 36
Carl F.
  • 6,718
  • 3
  • 28
  • 41
  • 4
    For the record, this is the fastest solution I found: mine = 4.5s, yours=0.43s, Donkopotamus = 14.8s – mathieu Nov 29 '11 at 09:18
  • To be honest, I expected the itertools solutions to be faster. Glad I could help! – Carl F. Dec 01 '11 at 02:07
  • 100
    your batch in fact accepts a list (with len()), not iterable (without len()) – tdihp Jan 10 '14 at 07:47
  • 46
    This is faster because it isn't a solution to the problem. The grouper recipe by Raymond Hettinger - currently below this - is what you are looking for for a general solution that doesn't require the input object to have a __len__ method. – Robert E Mealey Oct 15 '14 at 22:52
  • 2
    It may be not the most general solution. But it is fast, and it does not return `None`. The example above produces [0, 1, 2] [3, 4, 5] [6, 7, 8] [9] – MasterControlProgram Feb 01 '17 at 13:02
  • 8
    Why you use min()? Without `min()` code is completely correct! – Pavel Patrin Jun 01 '17 at 19:05
  • 29
    [Iterables](https://docs.python.org/3/glossary.html#term-iterable) don't have `len()`, [sequences](https://docs.python.org/3/glossary.html#term-sequence) have `len()` – Kos Oct 26 '17 at 12:50
  • what i want to make all the batches of same size. E.g if last list is like[9], then I want to add first two-element of first list [9,0,1]. if I do batch =30, I want to apply same logic, it is possible? – Coder Oct 21 '21 at 20:20
87

The recipes in the itertools module provide two ways to do this depending on how you want to handle a final odd-sized lot (keep it, pad it with a fillvalue, ignore it, or raise an exception):

from itertools import islice, zip_longest

def batched(iterable, n):
    "Batch data into lists of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    it = iter(iterable)
    while True:
        batch = list(islice(it, n))
        if not batch:
            return
        yield batch

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')
Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • 14
    This is not exactly what I needed since it pads the last element with a set of None. i.e., None is a valid value in the data I actually use with my function so what I need instead is something that does not pad the last entry. – mathieu Nov 29 '11 at 09:05
  • 13
    @mathieu Replace `izip_longest` with `izip`, which will not pad the last entries, but instead cut off entries when some of the elements start running out. – GoogieK Dec 17 '16 at 15:46
  • 4
    Should be zip_longest/zip in python 3 – Peter Gerdes Jan 11 '17 at 07:29
  • 1
    Is this really a good solution when you want chunks with length in the 100, 1000s or more? – Peter Gerdes Jan 11 '17 at 07:30
  • 2
    @PeterGerdes Unless the input iterable already has some exploitable structure (i.e. reshaping a numpy array), this solution should be near optimal even for big chunk sizes. It runs at C-speed calling the iterator to fill-in tuple elements as fast as possible, and it reuses the output tuple whenever possible. – Raymond Hettinger Jan 11 '17 at 08:28
  • 6
    @GoogieK `for x, y in enumerate(grouper(3, xrange(10))): print(x,y)` does indeed not fill values, it just drops the incomplete segment altogether. – kadrach Mar 20 '17 at 03:24
  • 6
    As a one liner that drops the last element if incomplete: `list(zip(*[iter(iterable)] * n))`. This has to be the neatest bit of python code I've ever seen. – Le Frite Oct 02 '19 at 06:18
58

More-itertools includes two functions that do what you need:

Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
Yongwei Wu
  • 5,292
  • 37
  • 49
  • 2
    This is indeed the most fitting answer (even though it requires installation of one more package), and there's also `ichunked` that yields iterables. – viddik13 Jan 30 '20 at 14:39
  • 2
    As of python 3.12, the standard `itertools` package implements the batched function https://docs.python.org/3.12/library/itertools.html#itertools.batched – cmdoret May 01 '23 at 17:06
37

As others have noted, the code you have given does exactly what you want. For another approach using itertools.islice you could see an example of following recipe:

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)
plaes
  • 31,788
  • 11
  • 91
  • 89
donkopotamus
  • 22,114
  • 2
  • 48
  • 60
  • Can we yield `batchiter` itself instead of a chain of it ? – abhilash Apr 20 '17 at 05:12
  • 2
    @abhilash No ... this code uses the call to `next()` to cause a `StopIteration` once `sourceiter` is exhausted, thus terminating the iterator. Without the call to `next` it would continue to return empty iterators indefinitely. – donkopotamus Apr 20 '17 at 05:45
  • 11
    I had to replace `batchiter.next()` with `next(batchiter)` to make the above code work in Python 3. – Martin Wiebusch Jul 25 '17 at 14:52
  • 2
    pointing out a comment from the linked article: "You should add a warning that a batch has to be entirely consumed before you can proceed to the next one." The output of this should be consumed with something like: `map(list, batch(xrange(10), 3))`. Doing: `list(batch(xrange(10), 3)` will produce unexpected results. – Nathan Buesgens Sep 22 '17 at 13:46
  • 3
    Does not work on py3. `.next()` must be changed to `next(..)`, and `list(batch(range(0,10),3))` throws `RuntimeError: generator raised StopIteration` – mathieu Jul 19 '19 at 20:45
  • 2
    @mathieu: Wrap the `while` loop in `try:`/`except StopIteration: return` to fix the latter issue. – ShadowRanger Jan 22 '20 at 06:02
20

Solution for Python 3.8 if you are working with iterables that don't define a len function, and get exhausted:

from itertools import islice

def batcher(iterable, batch_size):
    iterator = iter(iterable)
    while batch := list(islice(iterator, batch_size)):
        yield batch

Example usage:

def my_gen():
    yield from range(10)
 
for batch in batcher(my_gen(), 3):
    print(batch)

>>> [0, 1, 2]
>>> [3, 4, 5]
>>> [6, 7, 8]
>>> [9]

Could of course be implemented without the walrus operator as well.

VisioN
  • 143,310
  • 32
  • 282
  • 281
Atra Azami
  • 2,215
  • 1
  • 14
  • 12
  • 7
    In the current version, `batcher` accepts an iterator, not an iterable. It would result in an infinite loop with a list, for example. There should probably be a line `iterator = iter(iterable)` before starting the `while` loop. – Daniel Perez Sep 13 '20 at 23:55
  • `from itertools import islice` just to be complete. =) – Kees Jul 27 '21 at 10:22
  • Can you please, let me know the significance of the walrus operator here? just for elaboration – ElSheikh Mar 06 '22 at 16:51
11

This is a very short code snippet I know that does not use len and works under both Python 2 and 3 (not my creation):

def chunks(iterable, size):
    from itertools import chain, islice
    iterator = iter(iterable)
    for first in iterator:
        yield list(chain([first], islice(iterator, size - 1)))
Yongwei Wu
  • 5,292
  • 37
  • 49
10

Weird, seems to work fine for me in Python 2.x

>>> def batch(iterable, n = 1):
...    current_batch = []
...    for item in iterable:
...        current_batch.append(item)
...        if len(current_batch) == n:
...            yield current_batch
...            current_batch = []
...    if current_batch:
...        yield current_batch
...
>>> for x in batch(range(0, 10), 3):
...     print x
...
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
John Doe
  • 3,436
  • 1
  • 18
  • 21
5

A workable version without new features in python 3.8, adapted from @Atra Azami's answer.

import itertools    

def batch_generator(iterable, batch_size=1):
    iterable = iter(iterable)

    while True:
        batch = list(itertools.islice(iterable, batch_size))
        if len(batch) > 0:
            yield batch
        else:
            break

for x in batch_generator(range(0, 10), 3):
    print(x)

Output:

[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]
5

I like this one,

def batch(x, bs):
    return [x[i:i+bs] for i in range(0, len(x), bs)]

This returns a list of batches of size bs, you can make it a generator by using a generator expression (i for i in iterable) of course.

0-_-0
  • 1,313
  • 15
  • 15
4
def batch(iterable, n):
    iterable=iter(iterable)
    while True:
        chunk=[]
        for i in range(n):
            try:
                chunk.append(next(iterable))
            except StopIteration:
                yield chunk
                return
        yield chunk

list(batch(range(10), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
Atila Romero
  • 359
  • 3
  • 7
4

Moving as much into CPython as possible, by leveraging islice and iter(callable) behavior:

from itertools import islice

def chunked(generator, size):
    """Read parts of the generator, pause each time after a chunk"""
    # islice returns results until 'size',
    # make_chunk gets repeatedly called by iter(callable).
    gen = iter(generator)
    make_chunk = lambda: list(islice(gen, size))
    return iter(make_chunk, [])

Inspired by more-itertools, and shortened to the essence of that code.

vdboor
  • 21,914
  • 12
  • 83
  • 96
3

This is what I use in my project. It handles iterables or lists as efficiently as possible.

def chunker(iterable, size):
    if not hasattr(iterable, "__len__"):
        # generators don't have len, so fall back to slower
        # method that works with generators
        for chunk in chunker_gen(iterable, size):
            yield chunk
        return

    it = iter(iterable)
    for i in range(0, len(iterable), size):
        yield [k for k in islice(it, size)]


def chunker_gen(generator, size):
    iterator = iter(generator)
    for first in iterator:

        def chunk():
            yield first
            for more in islice(iterator, size - 1):
                yield more

        yield [k for k in chunk()]
Josh Smeaton
  • 47,939
  • 24
  • 129
  • 164
2

Here is an approach using reduce function.

Oneliner:

from functools import reduce
reduce(lambda cumulator,item: cumulator[-1].append(item) or cumulator if len(cumulator[-1]) < batch_size else cumulator + [[item]], input_array, [[]])

Or more readable version:

from functools import reduce
def batch(input_list, batch_size):
  def reducer(cumulator, item):
    if len(cumulator[-1]) < batch_size:
      cumulator[-1].append(item)
      return cumulator
    else:
      cumulator.append([item])
    return cumulator
  return reduce(reducer, input_list, [[]])

Test:

>>> batch([1,2,3,4,5,6,7], 3)
[[1, 2, 3], [4, 5, 6], [7]]
>>> batch(a, 8)
[[1, 2, 3, 4, 5, 6, 7]]
>>> batch([1,2,3,None,4], 3)
[[1, 2, 3], [None, 4]]
Lycha
  • 9,937
  • 3
  • 38
  • 43
1

This would work for any iterable.

from itertools import zip_longest, filterfalse

def batch_iterable(iterable, batch_size=2): 
    args = [iter(iterable)] * batch_size 
    return (tuple(filterfalse(lambda x: x is None, group)) for group in zip_longest(fillvalue=None, *args))

It would work like this:

>>>list(batch_iterable(range(0,5)), 2)
[(0, 1), (2, 3), (4,)]

PS: It would not work if iterable has None values.

Trideep Rath
  • 3,623
  • 1
  • 25
  • 14
0

You can just group iterable items by their batch index.

def batch(items: Iterable, batch_size: int) -> Iterable[Iterable]:
    # enumerate items and group them by batch index
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    # extract items from enumeration tuples
    item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

It is often the case when you want to collect inner iterables so here is more advanced version.

def batch_advanced(items: Iterable, batch_size: int, batches_mapper: Callable[[Iterable], Any] = None) -> Iterable[Iterable]:
    enumerated_item_groups = itertools.groupby(enumerate(items), lambda t: t[0] // batch_size)
    if batches_mapper:
        item_batches = (batches_mapper(t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    else:
        item_batches = ((t[1] for t in enumerated_items) for key, enumerated_items in enumerated_item_groups)
    return item_batches

Examples:

print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, tuple)))
# [(1, 9, 3, 5), (2, 4, 2)]
print(list(batch_advanced([1, 9, 3, 5, 2, 4, 2], 4, list)))
# [[1, 9, 3, 5], [2, 4, 2]]
dimathe47
  • 346
  • 1
  • 4
  • 8
0

Related functionality you may need:

def batch(size, i):
    """ Get the i'th batch of the given size """
    return slice(size* i, size* i + size)

Usage:

>>> [1,2,3,4,5,6,7,8,9,10][batch(3, 1)]
>>> [4, 5, 6]

It gets the i'th batch from the sequence and it can work with other data structures as well, like pandas dataframes (df.iloc[batch(100,0)]) or numpy array (array[batch(100,0)]).

alvitawa
  • 394
  • 1
  • 4
  • 12
0
from itertools import *

class SENTINEL: pass

def batch(iterable, n):
    return (tuple(filterfalse(lambda x: x is SENTINEL, group)) for group in zip_longest(fillvalue=SENTINEL, *[iter(iterable)] * n))

print(list(range(10), 3)))
# outputs: [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
print(list(batch([None]*10, 3)))
# outputs: [(None, None, None), (None, None, None), (None, None, None), (None,)]
yacc143
  • 375
  • 4
  • 6
0

I use

def batchify(arr, batch_size):
  num_batches = math.ceil(len(arr) / batch_size)
  return [arr[i*batch_size:(i+1)*batch_size] for i in range(num_batches)]
  
gazorpazorp
  • 468
  • 3
  • 13
0

Keep taking (at most) n elements until it runs out.

def chop(n, iterable):
    iterator = iter(iterable)
    while chunk := list(take(n, iterator)):
        yield chunk


def take(n, iterable):
    iterator = iter(iterable)
    for i in range(n):
        try:
            yield next(iterator)
        except StopIteration:
            return
W. Zhu
  • 755
  • 6
  • 16
0

This code has the following features:

  • Can take lists or generators (no len()) as input
  • Does not require imports of other packages
  • No padding added to last batch
def batch_generator(items, batch_size):
    itemid=0 # Keeps track of current position in items generator/list
    batch = [] # Empty batch
    for item in items: 
      batch.append(item) # Append items to batch
      if len(batch)==batch_size:
        yield batch
        itemid += batch_size # Increment the position in items
        batch = []
    yield batch # yield last bit
Douw Marx
  • 21
  • 3