24

I'm looking for a function that takes an iterable i and a size n and yields tuples of length n that are sequential values from i:

x = [1,2,3,4,5,6,7,8,9,0]
[z for z in TheFunc(x,3)]

gives

[(1,2,3),(4,5,6),(7,8,9),(0)]

Does such a function exist in the standard library?

If it exists as part of the standard library, I can't seem to find it and I've run out of terms to search for. I could write my own, but I'd rather not.

BCS
  • 75,627
  • 68
  • 187
  • 294
  • VTR since [the linked question](/q/434287) is about lists specifically, not iterables in general. – wjandrea Dec 17 '21 at 18:27
  • @wjandrea *even granting* that these questions are distinct from the canonical (which, I disagree, and intend to bring the issue up on Meta), this question is *clearly* a duplicate of the *other* one you VTRd on the same grounds. – Karl Knechtel Sep 23 '22 at 14:32
  • @Karl Sorry, what canonical? Regarding [the other one](/q/8991506/4518341), by all means. At that time, either I mistakenly thought iterables and iterators would be treated differently, or I was more focused on undoing the bad closure than finding a better duplicate. – wjandrea Sep 23 '22 at 16:18
  • The canonical I have in mind is https://stackoverflow.com/questions/434287/how-to-iterate-over-a-list-in-chunks. In general, people with Python questions need to be steered away from the idea that particular kinds of sequences, iterators, etc. need to be handled differently; the most natural ways to solve most problems work for any iterable, and the most natural ways to solve most of the rest work at least for any sequence. "Special cases aren't special enough to break the rules." – Karl Knechtel Sep 23 '22 at 16:20
  • @Karl I'm not following. That question's about lists, which are sequences, which support slicing, which the [top answer uses](/a/434328/4518341), but other iterables don't necessarily support slicing, so you can't use the same solution, so questions asking about iterables in general shouldn't be closed as duplicates of it, no? – wjandrea Sep 23 '22 at 16:29
  • \[This conversation continued [in chat](https://chat.stackoverflow.com/transcript/message/55273799#55273799)] – wjandrea Sep 23 '22 at 17:14

9 Answers9

33

When you want to group an iterator in chunks of n without padding the final group with a fill value, use iter(lambda: list(IT.islice(iterable, n)), []):

import itertools as IT

def grouper(n, iterable):
    """
    >>> list(grouper(3, 'ABCDEFG'))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    iterable = iter(iterable)
    return iter(lambda: list(IT.islice(iterable, n)), [])

seq = [1,2,3,4,5,6,7]
print(list(grouper(3, seq)))

yields

[[1, 2, 3], [4, 5, 6], [7]]

There is an explanation of how it works in the second half of this answer.


When you want to group an iterator in chunks of n and pad the final group with a fill value, use the grouper recipe zip_longest(*[iterator]*n):

For example, in Python2:

>>> list(IT.izip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

In Python3, what was izip_longest is now renamed zip_longest:

>>> list(IT.zip_longest(*[iter(seq)]*3, fillvalue='x'))
[(1, 2, 3), (4, 5, 6), (7, 'x', 'x')]

When you want to group a sequence in chunks of n you can use the chunks recipe:

def chunks(seq, n):
    # https://stackoverflow.com/a/312464/190597 (Ned Batchelder)
    """ Yield successive n-sized chunks from seq."""
    for i in xrange(0, len(seq), n):
        yield seq[i:i + n]

Note that, unlike iterators in general, sequences by definition have a length (i.e. __len__ is defined).

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
20

See the grouper recipe in the docs for the itertools package

def grouper(n, iterable, fillvalue=None):
  "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
  args = [iter(iterable)] * n
  return izip_longest(fillvalue=fillvalue, *args)

(However, this is a duplicate of quite a few questions.)

Community
  • 1
  • 1
Andrew Jaffe
  • 26,554
  • 4
  • 50
  • 59
4

How about this one? It doesn't have a fill value though.

>>> def partition(itr, n):
...     i = iter(itr)
...     res = None
...     while True:
...             res = list(itertools.islice(i, 0, n))
...             if res == []:
...                     break
...             yield res
...
>>> list(partition([1, 2, 3, 4, 5, 6, 7, 8, 9], 3))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>

It utilizes a copy of the original iterable, which it exhausts for each successive splice. The only other way my tired brain could come up with was generating splice end-points with range.

Maybe I should change list() to tuple() so it better corresponds to your output.

Skurmedel
  • 21,515
  • 5
  • 53
  • 66
  • LOL. You've GOT to be kidding me. There is a bug here in the answer, and my edit for it got rejected? My respect for the SO community has just diminished greatly. – Friendly Genius May 17 '15 at 01:23
  • 2
    btw, itertools.islice(i, 0, 3) -> itertools.islice(i, 0, n) Still can't believe the SO community. – Friendly Genius May 17 '15 at 01:24
  • I didn't reject it, someone else did. But you are correct. The 3 is hardcoded negating the purpose of n as a parameter. If you want I can edit it but you won't get any rep then, up to you : ) – Skurmedel May 17 '15 at 16:55
  • Yeah...I've kinda gotten over it by now. Just go ahead and edit it yourself :) – Friendly Genius May 17 '15 at 17:06
3

This is a very common request in Python. Common enough that it made it into the boltons unified utility package. First off, there are extensive docs here. Furthermore, the module is designed and tested to only rely on the standard library (Python 2 and 3 compatible), meaning you can just download the file directly into your project.

# if you downloaded/embedded, try:
# from iterutils import chunked

# with `pip install boltons` use:

from boltons.iterutils import chunked 

print(chunked(range(10), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

There's an iterator/generator form for indefinite/long sequences as well:

print(list(chunked_iter(range(10), 3, fill=None)))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

As you can see, you can also fill the sequence with a value of your choosing, as well. Finally, as the maintainer, I can assure you that, while the code has been downloaded/tested by thousands of developers, if you encounter any issues, you'll get the fastest support possible through the boltons GitHub Issues page. Hope this (and/or any of the other 150+ boltons recipes) helped!

Mahmoud Hashemi
  • 2,655
  • 30
  • 19
3

I use the chunked function from the more_itertools package.

$ pip install more_itertools
$ python
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> [tuple(z) for z in more_itertools.more.chunked(x, 3)]
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (0,)]
Jason R. Coombs
  • 41,115
  • 10
  • 83
  • 93
1

This is a very old quesiton, but I think it is useful to mention the following approach for the general case. Its main merit is that it only needs to iterate over the data once, so it will work with database cursors or other sequences that can only be used once. I also find it more readable.

def chunks(n, iterator):
    out = []
    for elem in iterator:
        out.append(elem)
        if len(out) == n:
            yield out
            out = []
    if out:
        yield out
MathKid
  • 1,903
  • 1
  • 21
  • 21
Gecko
  • 1,379
  • 11
  • 14
  • This is the most elegant answer. The only problem is that it can return an empty list as the last chunk. Add `if len(out) > 0:` before the last line to fix that. – MathKid Jun 05 '19 at 13:22
0
    def grouper(iterable, n):
        while True:
            yield itertools.chain((next(iterable),), itertools.islice(iterable, n-1))
OrangeDog
  • 36,653
  • 12
  • 122
  • 207
Svein Lindal
  • 179
  • 1
  • 3
0

I know this has been answered several times but I'm adding my solution which should improve in both, general applicability to sequences and iterators, readability (no invisible loop exit condition by StopIteration exception) and performance when compared to the grouper recipe. It is most similar to the last answer by Svein.

def chunkify(iterable, n):
    iterable = iter(iterable)
    n_rest = n - 1

    for item in iterable:
        rest = itertools.islice(iterable, n_rest)
        yield itertools.chain((item,), rest)
fungs
  • 81
  • 2
  • 7
0

Here is a different solution which makes no use of itertools and, even though it has a couple more lines, it apparently outperforms the given answers when chunks are a lot shorter than the iterable lenght. However, for big chunks the other answers are much faster.

def batchiter(iterable, batch_size):
    """
    >>> list(batchiter('ABCDEFG', 3))
    [['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]
    """
    next_batch = []
    for element in iterable:
        next_batch.append(element)
        if len(next_batch) == batch_size:
            batch, next_batch = next_batch, []
            yield batch
    if next_batch:
        yield next_batch


In [19]: %timeit [b for b in batchiter(range(1000), 3)]
1000 loops, best of 3: 644 µs per loop

In [20]: %timeit [b for b in grouper(3, range(1000))]
1000 loops, best of 3: 897 µs per loop

In [21]: %timeit [b for b in partition(range(1000), 3)]
1000 loops, best of 3: 890 µs per loop

In [22]: %timeit [b for b in batchiter(range(1000), 333)]
1000 loops, best of 3: 540 µs per loop

In [23]: %timeit [b for b in grouper(333, range(1000))]
10000 loops, best of 3: 81.7 µs per loop

In [24]: %timeit [b for b in partition(range(1000), 333)]
10000 loops, best of 3: 80.1 µs per loop
Carles Sala
  • 1,989
  • 1
  • 16
  • 34