11

If you have a list in Python 3.7:

>>> li
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

You can turn that into a list of chunks each of length n with one of two common Python idioms:

>>> n=3
>>> list(zip(*[iter(li)]*n))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]

Which drops the last incomplete tuple since (9,10) is not length n

You can also do:

>>> [li[i:i+n] for i in range(0,len(li),n)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

if you want the last sub list even if it has less than n elements.

Suppose now I have a generator, gen, unknown length or termination (so calling list(gen)) or sum(1 for _ in gen) would not be wise) where I want every chunk.

The best generator expression that I have been able to come up with is something along these lines:

from itertools import zip_longest
sentinel=object()             # for use in filtering out ending chunks
gen=(e for e in range(22))    # fill in for the actual gen

g3=(t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(gen)]*n,fillvalue=sentinel))

That works for the intended purpose:

>>> next(g3)
(0, 1, 2)
>>> next(g3)
(3, 4, 5)
>>> list(g3)
[(6, 7, 8), (9, 10)]

It just seems -- clumsy. I tried:

  1. using islice but the lack of length seems hard to surmount;
  2. using a sentinel in iter but the sentinel version of iter requires a callable, not an iterable.

Is there a more idiomatic Python 3 technique for a generator of chunks of length n including the last chuck that might be less than n?

I am open to a generator function as well. I am looking for something idiomatic and mostly more readable.


Update:

DSM's method in his deleted answer is very good I think:

>>> g3=(iter(lambda it=iter(gen): tuple(islice(it, n)), ()))
>>> next(g3)
(0, 1, 2)
>>> list(g3)
[(3, 4, 5), (6, 7, 8), (9, 10)]

I am open to this question being a dup but the linked question is almost 10 years old and focused on a list. There is no new method in Python 3 with generators where you don't know the length and don't want any more than a chunk at a time?

dawg
  • 98,345
  • 23
  • 131
  • 206
  • probably I misunderstand, but what is wrong with `islice` like `for item in gen: print(tuple(islice(gen,3)))` (replace `print` with `yield` for a generator function of course) – Chris_Rands Jul 20 '18 at 16:06
  • 1
    Possible duplicates https://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks , https://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks, https://stackoverflow.com/questions/8991506/iterate-an-iterator-by-chunks-of-n-in-python – Mazdak Jul 20 '18 at 16:19
  • @Kasramvd: ah, yep -- my answer is just [senderle's](https://stackoverflow.com/a/22045226/487339) with a default value to one-line it,. – DSM Jul 20 '18 at 16:26
  • @Kasramvd: I don't think those are quite duplicates since 1) mostly have to do with lists already in memory or 2) not taking newer features of Python 3.6+ and 3) have some variant of the two idioms I listed. The linked question is 10 years old. Are we concluding there is no new Python 3 only way to do this? – dawg Jul 20 '18 at 17:12

8 Answers8

9

I think this is always going to be messy as long as you're trying to fit this into a one liner. I would just bite the bullet and go with a generator function here. Especially useful if you don't know the actual size (say, if gen is an infinite generator, etc).

from itertools import islice

def chunk(gen, k):
    """Efficiently split `gen` into chunks of size `k`.

       Args:
           gen: Iterator to chunk.
           k: Number of elements per chunk.

       Yields:
           Chunks as a list.
    """ 
    while True:
        chunk = [*islice(gen, 0, k)]
        if chunk:
            yield chunk
        else:
            break

>>> gen = iter(list(range(11)))
>>> list(chunk(gen))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

Someone may have a better suggestion, but this is how I'd do it.

cs95
  • 379,657
  • 97
  • 704
  • 746
3

This feels like a pretty reasonable approach that builds just on itertools.

>>> g = (i for i in range(10))
>>> g3 = takewhile(lambda x: x, (list(islice(g,3)) for _ in count(0)))
>>> list(g3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
g.d.d.c
  • 46,865
  • 9
  • 101
  • 111
2

I have put together some timings for the answers here.

The way I originally wrote it is actually the fastest on Python 3.7. For a one liner, that is likely the best.

A modified version of cold speed's answer is both fast and Pythonic and readable.

The other answers are all similar speed.

The benchmark:

from __future__ import print_function

try:
    from itertools import zip_longest, takewhile, islice, count 
except ImportError:
    from itertools import takewhile, islice, count  
    from itertools import izip_longest as zip_longest
from collections import deque 

def f1(it,k):
    sentinel=object()
    for t in (t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(it)]*k, fillvalue=sentinel)):
        yield t

def f2(it,k): 
    for t in (iter(lambda it=iter(it): tuple(islice(it, k)), ())):
        yield t

def f3(it,k):
    while True:
        chunk = (*islice(it, 0, k),)   # tuple(islice(it, 0, k)) if Python < 3.5
        if chunk:
            yield chunk
        else:
            break

def f4(it,k):
    for t in takewhile(lambda x: x, (tuple(islice(it,k)) for _ in count(0))):
        yield t

if __name__=='__main__':
    import timeit    
    def tf(f, k, x):
        data=(y for y in range(x))
        return deque(f(data, k), maxlen=3)

    k=3
    for f in (f1,f2,f3,f4):
        print(f.__name__, tf(f,k,100000))
    for case, x in (('small',10000),('med',100000),('large',1000000)):  
        print("Case {}, {:,} x {}".format(case,x,k))
        for f in (f1,f2,f3,f4):
            print("   {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("tf(f, k, x)", setup="from __main__ import f, tf, x, k", number=10)))    

And the results:

f1 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f2 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f3 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f4 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
Case small, 10,000 x 3
       f1    0.0125 secs
       f2    0.0231 secs
       f3    0.0185 secs
       f4    0.0250 secs
Case med, 100,000 x 3
       f1    0.1239 secs
       f2    0.2270 secs
       f3    0.1845 secs
       f4    0.2477 secs
Case large, 1,000,000 x 3
       f1    1.2140 secs
       f2    2.2431 secs
       f3    1.7967 secs
       f4    2.4697 secs
dawg
  • 98,345
  • 23
  • 131
  • 206
1

This solution with a generator function is fairly explicit and short:

def g3(seq):
    it = iter(seq)
    while True:
        head = list(itertools.islice(it, 3))
        if head:
            yield head
        else:
            break
Florian Weimer
  • 32,022
  • 3
  • 48
  • 92
1

The itertools recipe section of the doc offers various generator helpers.

Here you can modify take with the second form of iter to create a chunk generator.

from itertools import islice

def chunks(n, it):
    it = iter(it)
    return iter(lambda: tuple(islice(it, n)), ())

Example

li = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(*chunks(3, li))

Output

(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10)
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
1

more_itertools.chunked:

list(more_itertools.chunked(range(11), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

See also the source:

iter(functools.partial(more_itertools.take, n, iter(iterable)), [])
pylang
  • 40,867
  • 14
  • 129
  • 121
0

My attempt using groupby and cycle. With cycle you can choose a pattern how to group your elements, so it's versatile:

from itertools import groupby, cycle

gen=(e for e in range(11))
d = [list(g) for d, g in groupby(gen, key=lambda v, c=cycle('000111'): next(c))]
print([v for v in d])

Outputs:

[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

we can do this by using grouper function given in itertools documentation page.

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

def out_iterator(lst):
    for each in grouper(lst,n):
        if None in each:
            yield each[:each.index(None)]
        else:
            yield each
a=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n=3
print(list(out_iterator(a)))

Output:

[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10)]
rawwar
  • 4,834
  • 9
  • 32
  • 57