0

I came across two different ways to split an iterable into "chunks" (more than 1 item).

One method uses itertools:

from itertools import izip_longest
def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

the other method is straight python:

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

Does the itertools implementation buy you anything "extra"?

Where "extra" would be, maybe faster or more flexible or safer.

I ask because the itertools implementation shown here is definitely NOT more readable/intuitive IMO.

Trevor Boyd Smith
  • 18,164
  • 32
  • 127
  • 177
  • I agree that the iterable approach looks a bit magic-y, but it can be implemented in a more readable way as well. Both approaches, iterators and slices, can be written in many ways, so readability isn't really a factor imo – Felk Sep 22 '17 at 11:28
  • You can also use: `def chunker(iterable, size): yield from iter(lambda it=iter(iterable): list(islice(it, size)), [])`... – Jon Clements Sep 22 '17 at 11:30
  • 1
    @unutbu yup and will work nicely prior to `yield from` as well... Only noticeable difference is that `inspect.isgenerator` fails for `iter`... but *shrugs*... 6 and half a dozen and all that... – Jon Clements Sep 22 '17 at 11:40
  • @Felk I think "magic-y" is an euphemism. To cite the docs for the abused list initialisation through a multiplier: "_they are referenced multiple times. This often haunts new Python programmers_". The bad thing in this is that the nearly identical `[iter(iterable) for _ in range(n)]` has different effects. – Vroomfondel Sep 22 '17 at 12:42
  • Related: [Why would I want to use itertools.islice instead of normal list slicing](https://stackoverflow.com/q/32172612/190597). – unutbu Sep 24 '17 at 17:00

2 Answers2

5

grouper can be used with any iterable -- including generators and infinite iterators. chunker can only be used with sequences -- iterables whose length is knowable in advance.

from itertools import izip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return izip_longest(*args, fillvalue=fillvalue)

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

x = (i**2 for i in range(5))

print(list(grouper(x, 3)))
# [(0, 1, 4), (9, 16, None)]

print(list(chunker(x, 3)))
# TypeError: object of type 'generator' has no len()
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
1

The two functions aren't equivalent. There are several differences:

  • Your grouper will work for any iterable including iterators and generators, while chunker requires sequences that support indexing (the [...]).

    >>> it = lambda : (i for i in range(6))  # creates a generator when called
    >>> list(grouper(it(), 3))
    [(0, 1, 2), (3, 4, 5)]
    >>> list(chunker(it(), 3))
    TypeError: object of type 'generator' has no len()
    

    Note that the other answer already mentioned this!

  • In case the chunksize isn't a divisor of the length the chunkers last element will be smaller than the chunksize. OTOH the grouper will fill it with some fillvalue. Also chunker will return the same type as the original, while grouper will return tuples:

    >>> list(grouper([1,2,3,4,5], 3))
    [(1, 2, 3), (4, 5, None)]
    >>> list(chunker([1,2,3,4,5], 3))
    [[1, 2, 3], [4, 5]]
    
  • grouper uses the high-performance built-ins iter and zip_longest. These will be very fast. At the expense of readability. However this can make it much faster than chunker:

    l = list(range(10000))
    %timeit list(grouper(l, 10))
    # 320 µs ± 6.39 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %timeit list(chunker(l, 10))
    # 1.22 ms ± 19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    

So grouper is a faster and more general approach than chunker. However depending on the case it could be more useful to use chunker, for example if you don't like the "fill"-part or want to preserve the type of the "chunks".

MSeifert
  • 145,886
  • 38
  • 333
  • 352