Iterate an iterator by chunks (of n) in Python?

Question

Can you think of a nice way (maybe with itertools) to split an iterator into chunks of given size?

Therefore l=[1,2,3,4,5,6,7] with chunks(l,3) becomes an iterator [1,2,3], [4,5,6], [7]

I can think of a small program to do that but not a nice way with maybe itertools.

@kindall: This is close, but not the same, due to the handling of the last chunk. — Sven Marnach, Jan 24 '12 at 17:48
This is slightly different, as that question was about lists, and this one is more general, iterators. Although the answer appears to end up being the same. — recursive, Jan 24 '12 at 17:48
@recursive: Yes, after reading the linked thread completely, I found that everything in my answer already appears somwhere in the other thread. — Sven Marnach, Jan 24 '12 at 17:56
VTR since [one of the linked questions](/q/434287) is about lists specifically, not iterables in general. — wjandrea, Dec 17 '21 at 18:27
Does this answer your question? [Python generator that groups another iterable into groups of N](https://stackoverflow.com/questions/3992735/python-generator-that-groups-another-iterable-into-groups-of-n) — Tomerikoo, Dec 20 '21 at 13:16

score 180 · Accepted Answer · edited Jan 03 '23 at 17:11

180

The grouper() recipe from the itertools documentation's recipes comes close to what you want:

def grouper(iterable, n, *, incomplete='fill', fillvalue=None):
    "Collect data into non-overlapping fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, fillvalue='x') --> ABC DEF Gxx
    # grouper('ABCDEFG', 3, incomplete='strict') --> ABC DEF ValueError
    # grouper('ABCDEFG', 3, incomplete='ignore') --> ABC DEF
    args = [iter(iterable)] * n
    if incomplete == 'fill':
        return zip_longest(*args, fillvalue=fillvalue)
    if incomplete == 'strict':
        return zip(*args, strict=True)
    if incomplete == 'ignore':
        return zip(*args)
    else:
        raise ValueError('Expected fill, strict, or ignore')

This won't work well when the last chunk is incomplete though, as, depending on the incomplete mode, it will either fill up the last chunk with a fill value, raise an exception, or silently drop the incomplete chunk.

In more recent versions of the recipes they added the batched recipe that does exactly what you want:

def batched(iterable, n):
    "Batch data into tuples of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while (batch := tuple(islice(it, n))):
        yield batch

Finally, a less general solution that only works on sequences but does handle the last chunk as desired and preserves the type of the original sequence is:

(my_list[i:i + chunk_size] for i in range(0, len(my_list), chunk_size))

edited Jan 03 '23 at 17:11

ShadowRanger

143,180
12
188
271

answered Jan 24 '12 at 17:48

Sven Marnach

574,206
118
941
841

Thanks for this and all other ideas! Sorry that I missed the numerious threads already discussing this question. I had tried `islice` but somehow I missed that it indeed soaks up the iterator as desired. Now I'm thinking of defining a custom iterator class which provides all sorts of functionality :) – Gere Jan 25 '12 at 09:52
Would `if chunk: yield chunk` be acceptable? it shaves a line off and is as semantic as a single `return`. – Capi Etheriel Oct 31 '14 at 16:52
6

@barraponto: No, it wouldn't be acceptable, since you would be left with an infinite loop. – Sven Marnach Oct 31 '14 at 17:57
14

I am surprised that this is such a highly-voted answer. The recipe works great for small `n`, but for large groups, is very inefficient. My n, e.g., is 200,000. Creating a temporary list of 200K items is...not ideal. – Jonathan Eunice Apr 24 '15 at 00:02
7

@JonathanEunice: In almost all cases, this is what people want (which is the reason why it is included in the Python documentation). Optimising for a particular special case is out of scope for this question, and even with the information you included in your comment, I can't tell what the best approach would be for you. If you want to chunk a list of numbers that fits into memory, you are probably best off using NumPy's `.resize()` message. If you want to chunk a general iterator, the second approach is already quite good -- it creates temporary tuples of size 200K, but that's not a big deal. – Sven Marnach Apr 26 '15 at 15:56
6

@SvenMarnach We'll have to disagree. I believe people want convenience, not gratuitous overhead. They get the overhead because the docs provide a needlessly bloated answer. With large data, temporary tuples/lists/etc. of 200K or 1M items make the program consume gigabytes of excess memory and take much longer to run. Why do that if you don't have to? At 200K, extra temp storage makes the overall program take 3.5x longer to run than with it removed. Just that one change. So it is a pretty big deal. NumPy won't work because the iterator is a database cursor, not a list of numbers. – Jonathan Eunice Apr 27 '15 at 02:24
@JonathanEunice: Sorry, when I said "the second approach" I actually meant the third one in my answer. There will only be a single 200K chunk at any given time, unless you store all of them (in which case you can't blame the code in this answer, but should blame your own code instead), and I can't see how this would use gigabytes of memory. That said, you are currently optimising along a very particular dimension, and all these optimisations have to be tailored to special cases. If you have a solution that you think is better for the general case, please enter an answer of your own. – Sven Marnach Apr 28 '15 at 09:19
@JonathanEunice: I think I still haven't understood your use case. – Sven Marnach Apr 28 '15 at 09:20
@JonathanEunice also, you are incorrect about the scale of the overhead. If you chunk a list using these methods, you are creating new objects for each chunk, the cost of that object already exists, so underneath the hood you just have to account for new points, so 200,000 * 8 * 1e-6 = 1.6 megabytes of overhead for a 200K size list. And about 5 times that for a million. – juanpa.arrivillaga Aug 16 '19 at 19:45
@juanpa.arrivillaga Note that I said "200K items." 200K items does of course consume ≫ 200K bytes, especially given Python not being particularly space-efficient. – Jonathan Eunice Aug 16 '19 at 21:08
@JonathanEunice yes that's what I accounted for and the memory overhead is about 1.6 megabytes, which is several orders of magnitude less than gigabytes of excess memory – juanpa.arrivillaga Aug 16 '19 at 21:23
2

izip_longest was renamed to zip_longest in Python 3 – hojin Oct 30 '19 at 12:47
A note: [The `itertools` recipes](https://docs.python.org/3/library/itertools.html#itertools-recipes) have been updated, and `grouper` now supports three different modes for how an uneven trailing block should be handled, while a new recipe for `batched` is nigh identical to your final `grouper` recipe (adding an up-front check for a valid `n`). Both of them switched the order of the initial arguments (it's now `iterable`, then `n`, not `n` then `iterable`; that got changed way back in the 3.3 docs). – ShadowRanger Jan 03 '23 at 17:04

score 92 · Answer 2 · edited Jan 03 '23 at 17:39

92

Although OP asks function to return chunks as list or tuple, in case you need to return iterators, then Sven Marnach's solution can be modified:

def batched_it(iterable, n):
    "Batch data into iterators of length n. The last batch may be shorter."
    # batched('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    it = iter(iterable)
    while True:
        chunk_it = itertools.islice(it, n)
        try:
            first_el = next(chunk_it)
        except StopIteration:
            return
        yield itertools.chain((first_el,), chunk_it)

Some benchmarks: http://pastebin.com/YkKFvm8b

It will be slightly more efficient only if your function iterates through elements in every chunk.

edited Jan 03 '23 at 17:39

ShadowRanger

143,180
12
188
271

answered Jan 25 '12 at 04:59

reclosedev

9,352
34
51

24

I arrived at almost exactly this design today, after finding the answer in the documentation (which is the accepted, most-highly-voted answer above) *massively* inefficient. When you're grouping hundreds of thousands or millions of objects at a time--which is when you need segmentation the most--it has to be pretty efficient. THIS is the right answer. – Jonathan Eunice Apr 24 '15 at 01:36
This is the best solution. – Lawrence Jan 31 '18 at 09:46
5

Won't this behave wrongly if the caller doesn't exhaust `chunk_it` (by breaking the inner loop early for example)? – Tavian Barnes Dec 18 '18 at 19:01
@TavianBarnes good point, if a first group is not exhausted, a second will start where the first left. But it may be considered as a feature if you want the both to be looped concurrently. Powerful but handle with care. – loutre Mar 01 '19 at 14:11
@TavianBarnes: This can be made to behave correctly in that case by making a cheap iterator consumer (fastest in CPython if you create it outside the loop is `consume = collections.deque(maxlen=0).extend`), then add `consume(chunk_it)` after the `yield` line; if the caller consumed the `yield`ed `chain`, it does nothing, if they didn't, it consumes it on their behalf as efficiently as possible. Put it in the `finally` of a `try` wrapping the `yield` if you need it to advance a caller provided iterator to the end of the chunk if the outer loop is broken early. – ShadowRanger Jan 15 '20 at 03:08
3

A little late to the party: this excellent answer could be shortened a bit by replacing the while loop with a for loop: `for x in it: yield chain((x,), islice(it, n))`, right? – Claas Feb 11 '22 at 16:58
@Claas that worked for me. at least so far. – kdubs May 24 '22 at 01:58
@Claas: Well, you'd want `islice(it, n - 1)` (or for performance you'd want to decrement `n` once up-front, and verify it's still `>=0`) to get the counts right, but yes, that's going to be a slightly faster solution (as it pushes a little more per-item work to C layer). – ShadowRanger Jan 03 '23 at 16:45
@TavianBarnes: I ended up posting [an answer](https://stackoverflow.com/a/74997058/364696) that combines my three-year-old suggestion with Claas's one-year-old suggestion to improve speed and correctness. The cost of unconditionally consuming the remainder of the `islice` each time should be pretty minimal (and compensated for by the speed gains from letting Python do the work to check for and retrieve the first element for us). – ShadowRanger Jan 03 '23 at 18:12

Cédric ROYER · Answer 3 · 2022-09-09T09:16:11.180

19

Since python 3.8, there is a simpler solution using the := operator:

def grouper(iterator: Iterator, n: int) -> Iterator[list]:
    while chunk := list(itertools.islice(iterator, n)):
        yield chunk

and then call it that way:

>>> list(grouper(iter('ABCDEFG'), 3))
[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]

Note: you can put iter in the grouper function to take an Iterable instead of an Iterator.

edited Sep 09 '22 at 09:16

answered Apr 21 '22 at 08:53

Cédric ROYER

389
2
4

FYI: If you're anyone like me having trouble finding `Iterator` type, use the `from collections.abc import Iterator` to import. – Md Mazedul Islam Khan Dec 30 '22 at 10:52
2

Note that passing a list (like `l` in OP's example) directly causes an infinite loop in `grouper`. Pass `iter(l)` instead, or modify the function accordingly. – Nicolai Weitkemper Jan 01 '23 at 21:17

score 17 · Answer 4 · edited Jul 30 '18 at 09:30

17

This will work on any iterable. It returns generator of generators (for full flexibility). I now realize that it's basically the same as @reclosedevs solution, but without the fluff. No need for try...except as the StopIteration propagates up, which is what we want.

The next(iterable) call is needed to raise the StopIteration when the iterable is empty, since islice will continue spawning empty generators forever if you let it.

It's better because it's only two lines long, yet easy to comprehend.

def grouper(iterable, n):
    while True:
        yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))

Note that next(iterable) is put into a tuple. Otherwise, if next(iterable) itself were iterable, then itertools.chain would flatten it out. Thanks to Jeremy Brown for pointing out this issue.

edited Jul 30 '18 at 09:30

OrangeDog

36,653
12
122
207

answered Apr 08 '15 at 20:36

Svein Lindal

179
1
3

3

While that may answer the question including some part of explanation and description might help understand your approach and enlighten us as to why your answer stands out – deW1 Apr 08 '15 at 20:55
Don't just copy your answer to another question. If you need to do that, then it suggests that one is a duplicate of the other, which they are and I voted to close. – Artjom B. Apr 08 '15 at 21:12
It's a duplicate. Saw this thread after. Which turns out has a variation of my answer. – Svein Lindal Apr 09 '15 at 13:03
2

iterable.next() needs to be contained or yielded by an interator for the chain to work properly - eg. yield itertools.chain([iterable.next()], itertools.islice(iterable, n-1)) – Jeremy Brown Dec 16 '15 at 04:56
3

`next(iterable)`, not `iterable.next()`. – Antti Haapala -- Слава Україні Apr 28 '17 at 12:05
4

It might make sense to prefix the while loop with the line `iterable = iter(iterable)` to turn your *iterable* into an *iterator* first. [***Iterables* do not have a `__next__` method.**](https://stackoverflow.com/questions/9884132/what-exactly-are-iterator-iterable-and-iteration) – Mateen Ulhaq Nov 24 '18 at 04:53
3

Raising StopIteration in a generator function is deprecated since PEP479. So I prefer explicit return statement of@reclesedevs solution. – loutre Mar 01 '19 at 14:04
2

@loutre indeed in python 3.7 it raises an exception... – drevicko Aug 28 '19 at 09:11
Does not work in python 3.8. To fix put `iterable = iter(iterable)` at the beginning and a `try-except StopIteration: return` around the while loop. These modifications make the solution very similar (but not identical) to reclosedev's version. (It has roughly the same performance, but IMO is a bit cleaner.) – dlazesz Mar 13 '21 at 21:58

score 16 · Answer 5 · answered Feb 18 '23 at 02:25

16

Python 3.12 adds itertools.batched, which works on all iterables (including lists):

>>> from itertools import batched
>>> for batch in batched('ABCDEFG', 3):
...     print(batch)
('A', 'B', 'C')
('D', 'E', 'F')
('G',)

answered Feb 18 '23 at 02:25

mike

4,901
2
19
19

eidorb · Answer 6 · 2021-12-07T08:32:52.957

I was working on something today and came up with what I think is a simple solution. It is similar to jsbueno's answer, but I believe his would yield empty groups when the length of iterable is divisible by n. My answer does a simple check when the iterable is exhausted.

def chunk(iterable, chunk_size):
    """Generates lists of `chunk_size` elements from `iterable`.
    
    
    >>> list(chunk((2, 3, 5, 7), 3))
    [[2, 3, 5], [7]]
    >>> list(chunk((2, 3, 5, 7), 2))
    [[2, 3], [5, 7]]
    """
    iterable = iter(iterable)
    while True:
        chunk = []
        try:
            for _ in range(chunk_size):
                chunk.append(next(iterable))
            yield chunk
        except StopIteration:
            if chunk:
                yield chunk
            break

For Python3, you'll need to change `iterable.next()` to `next(iterable)` — rrauenza, Nov 22 '21 at 03:28

score 3 · Answer 7 · edited Jan 24 '12 at 19:42

3

Here's one that returns lazy chunks; use map(list, chunks(...)) if you want lists.

from itertools import islice, chain
from collections import deque

def chunks(items, n):
    items = iter(items)
    for first in items:
        chunk = chain((first,), islice(items, n-1))
        yield chunk
        deque(chunk, 0)

if __name__ == "__main__":
    for chunk in map(list, chunks(range(10), 3)):
        print chunk

    for i, chunk in enumerate(chunks(range(10), 3)):
        if i % 2 == 1:
            print "chunk #%d: %s" % (i, list(chunk))
        else:
            print "skipping #%d" % i

edited Jan 24 '12 at 19:42

ekhumoro

115,249
20
229
336

answered Jan 24 '12 at 18:05

Peter Otten

31
1

Care to comment on how this works. – Marcin Jan 24 '12 at 19:44
3

A caveat: This generator yields iterables that remain valid only until the next iterable is requested. When using e.g. `list(chunks(range(10), 3))`, all iterables will already have been consumed. – Sven Marnach Jan 25 '12 at 14:19

Marcin · Answer 8 · 2012-01-25T14:45:17.457

A succinct implementation is:

chunker = lambda iterable, n: (ifilterfalse(lambda x: x == (), chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=())))

This works because [iter(iterable)]*n is a list containing the same iterator n times; zipping over that takes one item from each iterator in the list, which is the same iterator, with the result that each zip-element contains a group of n items.

izip_longest is needed to fully consume the underlying iterable, rather than iteration stopping when the first exhausted iterator is reached, which chops off any remainder from iterable. This results in the need to filter out the fill-value. A slightly more robust implementation would therefore be:

def chunker(iterable, n):
    class Filler(object): pass
    return (ifilterfalse(lambda x: x is Filler, chunk) for chunk in (izip_longest(*[iter(iterable)]*n, fillvalue=Filler)))

This guarantees that the fill value is never an item in the underlying iterable. Using the definition above:

iterable = range(1,11)

map(tuple,chunker(iterable, 3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10,)]

map(tuple,chunker(iterable, 2))
[(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]

map(tuple,chunker(iterable, 4))
[(1, 2, 3, 4), (5, 6, 7, 8), (9, 10)]

This implementation almost does what you want, but it has issues:

def chunks(it, step):
  start = 0
  while True:
    end = start+step
    yield islice(it, start, end)
    start = end

(The difference is that because islice does not raise StopIteration or anything else on calls that go beyond the end of it this will yield forever; there is also the slightly tricky issue that the islice results must be consumed before this generator is iterated).

To generate the moving window functionally:

izip(count(0, step), count(step, step))

So this becomes:

(it[start:end] for (start,end) in izip(count(0, step), count(step, step)))

But, that still creates an infinite iterator. So, you need takewhile (or perhaps something else might be better) to limit it:

chunk = lambda it, step: takewhile((lambda x: len(x) > 0), (it[start:end] for (start,end) in izip(count(0, step), count(step, step))))

g = chunk(range(1,11), 3)

tuple(g)
([1, 2, 3], [4, 5, 6], [7, 8, 9], [10])

1. The first code snippet contains the line `start = end`, which doesn't seem to be doing anything, since the next iteration of the loop will start with `start = 0`. Moreover, the loop is infinite -- it's `while True` without any `break`. 2. What is `len` in the second code snippet? 3. All other implementations only work for sequences, not for general iterators. 4. The check `x is ()` relies on an implementation detail of CPython. As an optimisation, the empty tuple is only created once and reused later. This is not guaranteed by the language specification though, so you should use `x == ()`. — Sven Marnach, Jan 25 '12 at 14:11
5. The combination of `count()` and `takewhile()` is much more easily implemented using `range()`. — Sven Marnach, Jan 25 '12 at 14:11
@SvenMarnach: I've edited the code and text in response to some of your points. Much-needed proofing. — Marcin, Jan 25 '12 at 14:20
That was fast. :) I still have an issue with the first code snippet: It only works if the yielded slices are consumed. If the user does not consume them immediately, strange things may happen. That's why Peter Otten used `deque(chunk, 0)` to consume them, but that solution has problems as well -- see my comment to his answer. — Sven Marnach, Jan 25 '12 at 14:30
I like the last version of `chunker()`. As a side note, a nice way to create a unique sentinel is `sentinel = object()` -- it is guaranteed to be distinct from any other object. — Sven Marnach, Jan 25 '12 at 14:33
I have reversed the order of my answers, so read @SvenMarnach's comments with care. — Marcin, Jan 25 '12 at 14:35
@SvenMarnach: Nice tip on sentinels - that didn't occur to me. — Marcin, Jan 25 '12 at 14:36

jsbueno · Answer 9 · 2017-01-02T11:29:54.257

"Simpler is better than complex" - a straightforward generator a few lines long can do the job. Just place it in some utilities module or so:

def grouper (iterable, n):
    iterable = iter(iterable)
    count = 0
    group = []
    while True:
        try:
            group.append(next(iterable))
            count += 1
            if count % n == 0:
                yield group
                group = []
        except StopIteration:
            yield group
            break

johnson · Answer 10 · 2022-09-06T12:48:48.953

1

Code golf edition:

def grouper(iterable, n):
    for i in range(0, len(iterable), n):
        yield iterable[i:i+n]

Usage:

>>> list(grouper('ABCDEFG', 3))
['ABC', 'DEF', 'G']

edited Sep 06 '22 at 12:48

answered Sep 06 '22 at 09:55

johnson

3,729
3
31
32

1

Implementation is good, but it's not answer the question: "Iterate an iterator by chunks (of n) in Python?", `grouper` should take an Iterator. – Cédric ROYER Sep 09 '22 at 09:19
1

True. But since this question is the first hit for a google search "python iterate in chunks", I think it belongs here nevertheless. – johnson Sep 09 '22 at 10:37

ShadowRanger · Answer 11 · 2023-01-03T18:14:08.353

A couple improvements on reclosedev's answer that make it:

Operate more efficiently and with less boilerplate code in the loop by delegating the pulling of the first element to Python itself, rather than manually doing so with a next call in a try/except StopIteration: block
Handle the case where the user discards the rest of the elements in any given chunk (e.g. an inner loop over the chunk breaks under certain conditions); in reclosedev's solution, aside from the very first element (which is definitely consumed), any other "skipped" elements aren't actually skipped (they just become the initial elements of the next chunk, which means you're no longer pulling data from n-aligned offsets, and if the caller breaks a loop over a chunk, they must manually consume the remaining elements even if they don't need them)

Combining those two fixes gets:

import collections  # At top of file
from itertools import chain, islice  # At top of file, denamespaced for slight speed boost

# Pre-create a utility "function" that silently consumes and discards all remaining elements in
# an iterator. This is the fastest way to do so on CPython (deque has a specialized mode
# for maxlen=0 that pulls and discards faster than Python level code can, and by precreating
# the deque and prebinding the extend method, you don't even need to create new deques each time)
_consume = collections.deque(maxlen=0).extend

def batched_it(iterable, n):
    "Batch data into sub-iterators of length n. The last batch may be shorter."
    # batched_it('ABCDEFG', 3) --> ABC DEF G
    if n < 1:
        raise ValueError('n must be at least one')
    n -= 1  # First element pulled for us, pre-decrement n so we don't redo it every loop
    it = iter(iterable)
    for first_el in it:
        chunk_it = islice(it, n)
        try:
            yield chain((first_el,), chunk_it)
        finally:
            _consume(chunk_it)  # Efficiently consume any elements caller didn't consume

Try it online!

score 1 · Answer 12 · answered Jan 24 '12 at 18:09

1

I forget where I found the inspiration for this. I've modified it a little to work with MSI GUID's in the Windows Registry:

def nslice(s, n, truncate=False, reverse=False):
    """Splits s into n-sized chunks, optionally reversing the chunks."""
    assert n > 0
    while len(s) >= n:
        if reverse: yield s[:n][::-1]
        else: yield s[:n]
        s = s[n:]
    if len(s) and not truncate:
        yield s

reverse doesn't apply to your question, but it's something I use extensively with this function.

>>> [i for i in nslice([1,2,3,4,5,6,7], 3)]
[[1, 2, 3], [4, 5, 6], [7]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True)]
[[1, 2, 3], [4, 5, 6]]
>>> [i for i in nslice([1,2,3,4,5,6,7], 3, truncate=True, reverse=True)]
[[3, 2, 1], [6, 5, 4]]

answered Jan 24 '12 at 18:09

Zach Young

10,137
4
32
53

This answer is close to the one I started with, but not quite: http://stackoverflow.com/a/434349/246801 – Zach Young Jan 24 '12 at 18:17
1

This only works for sequences, not for general iterables. – Sven Marnach Jan 25 '12 at 14:15
@SvenMarnach: Hi Sven, yes, thank you, you are absolutely correct. I saw the OP's example which used a list (sequence) and glossed over the wording of the question, assuming they meant sequence. Thanks for pointing that out, though. I didn't immediately understand the difference when I saw your comment, but have since looked it up. `:)` – Zach Young Jan 25 '12 at 16:02

score 1 · Answer 13 · answered Jan 24 '12 at 19:10

Here you go.

def chunksiter(l, chunks):
    i,j,n = 0,0,0
    rl = []
    while n < len(l)/chunks:        
        rl.append(l[i:j+chunks])        
        i+=chunks
        j+=j+chunks        
        n+=1
    return iter(rl)


def chunksiter2(l, chunks):
    i,j,n = 0,0,0
    while n < len(l)/chunks:        
        yield l[i:j+chunks]
        i+=chunks
        j+=j+chunks        
        n+=1

Examples:

for l in chunksiter([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]

for l in chunksiter2([1,2,3,4,5,6,7,8],3):
    print(l)

[1, 2, 3]
[4, 5, 6]
[7, 8]


for l in chunksiter2([1,2,3,4,5,6,7,8],5):
    print(l)

[1, 2, 3, 4, 5]
[6, 7, 8]

This only works for sequences, not for general iterables. – Sven Marnach Jan 25 '12 at 14:01 — Sven Marnach, Jan 25 '12 at 14:01

theberzi · Answer 14 · 2022-10-31T08:40:26.143

This function takes iterables which do not need to be Sized, so it will accept iterators too. It supports infinite iterables and will error-out if chunks with a smaller size than 1 are selected (even though giving size == 1 is effectively useless).

The type annotations are of course optional and the / in the parameters (which makes iterable positional-only) can be removed if you wish.

T = TypeVar("T")


def chunk(iterable: Iterable[T], /, size: int) -> Generator[list[T], None, None]:
    """Yield chunks of a given size from an iterable."""
    if size < 1:
        raise ValueError("Cannot make chunks smaller than 1 item.")

    def chunker():
        current_chunk = []
        for item in iterable:
            current_chunk.append(item)

            if len(current_chunk) == size:
                yield current_chunk

                current_chunk = []

        if current_chunk:
            yield current_chunk

    # Chunker generator is returned instead of yielding directly so that the size check
    #  can raise immediately instead of waiting for the first next() call.
    return chunker()

score 0 · Answer 15 · answered Jun 30 '23 at 11:43

0

Recursive solution:

def batched(i: Iterable, split: int) -> Tuple[Iterable, ...]:
    if chunk := i[:split]:
        yield chunk
        yield from batched(i[split:], split)

answered Jun 30 '23 at 11:43

zhukovgreen

1,551
16
26

Hans · Answer 16 · 2023-07-26T21:26:12.120

Here is a simple one:

n=2
l = list(range(15))
[l[i:i+n] for i in range(len(l)) if i%n==0]
Out[10]: [[0, 1], [2, 3], [4, 5], [6, 7], [8, 9], [10, 11], [12, 13], [14]]

for i in range(len(l)): This part specifies the iteration over the indices of l using the range() function and len(l) as the upper limit.

if i % n == 0: This condition filters the elements for the new list. i % n checks if the current index i is divisible by n without a remainder. If it is, the element at that index will be included in the new list; otherwise, it will be skipped.

l[i:i+n]: This part extracts a sublist from l. It uses slicing notation to specify a range of indices from i to i+n-1. So, for each index i that meets the condition i % n == 0, a sublist of length n is created, starting from that index.

Alternative (faster for bigger stuff):

[l[i:i+n] for i in range(0,len(l),n)]

Iterate an iterator by chunks (of n) in Python?

16 Answers16

Examples:

Linked

Related