12

In Python, it is easy to break an n-long list into k-size chunks if n is a multiple of k (IOW, n % k == 0). Here's my favorite approach (straight from the docs):

>>> k = 3
>>> n = 5 * k
>>> x = range(k * 5)
>>> zip(*[iter(x)] * k)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]

(The trick is that [iter(x)] * k produces a list of k references to the same iterator, as returned by iter(x). Then zip generates each chunk by calling each of the k copies of the iterator exactly once. The * before [iter(x)] * k is necessary because zip expects to receive its arguments as "separate" iterators, rather than a list of them.)

The main shortcoming I see with this idiom is that, when n is not a multiple of k (IOW, n % k > 0), the left over entries are just left out; e.g.:

>>> zip(*[iter(x)] * (k + 1))
[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11)]

There's an alternative idiom that is slightly longer to type, produces the same result as the one above when n % k == 0, and has a more acceptable behavior when n % k > 0:

>>> map(None, *[iter(x)] * k)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]
>>> map(None, *[iter(x)] * (k + 1))
[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, None)]

At least, here the left over entries are retained, but the last chunk gets padded with None. If one just wants a different value for the padding, then itertools.izip_longest solves the problem.

But suppose the desired solution is one in which the last chunk is left unpadded, i.e.

[(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14)]

Is there a simple way to modify the map(None, *[iter(x)]*k) idiom to produce this result?

(Granted, it is not difficult to solve this problem by writing a function (see, for example, the many fine replies to How do you split a list into evenly sized chunks? or What is the most "pythonic" way to iterate over a list in chunks?). Therefore, a more accurate title for this question would be "How to salvage the map(None, *[iter(x)]*k) idiom?", but I think it would baffle a lot of readers.)

I was struck by how easy it is to break a list into even-sized chunks, and how difficult (in comparison!) it is to get rid of the unwanted padding, even though the two problems seem of comparable complexity.

Community
  • 1
  • 1
kjo
  • 33,683
  • 52
  • 148
  • 265
  • Are you asking this a practical reason, or just to see whether it can be done? – Winston Ewert Aug 10 '11 at 02:49
  • Isn't this a duplicate of http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python ? – Ned Batchelder Aug 10 '11 at 03:40
  • @Ned Batchelder: I tried to make clear that this post was a follow-up/extension thereof (in fact, I cite the same stackoverflow post at the end). Also, as I tried to explain at the end of this post, this post is less about solving the chunking problem (good solutions to it are given in the posts I cited), but rather to find out if there was a simple way to extend the usefulness of a particular Python idiom. Maybe the posts needs a different title, but all the ones I could think of looked confusing... – kjo Aug 10 '11 at 04:06
  • But since we can write a function to do this and the idiom is decidedly non-obvious, why do you want this? – Winston Ewert Aug 10 '11 at 12:44

4 Answers4

15
[x[i:i+k] for i in range(0,n,k)]
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
3
sentinal = object()
split = ( 
    (v for v in r if v is not sentinal) for r in
    izip_longest(*[iter(x)]*n, fillvalue=sentinal))

Of course, the better idiom is to call a function as that'll be more readable then anything that'll do the same thing.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
3

from IPython's source:

def chop(seq,size):
    """Chop a sequence into chunks of the given size."""
    chunk = lambda i: seq[i:i+size]
    return map(chunk,xrange(0,len(seq),size))

The last list returned will have fewer than chunk elements if the sequence isn't evenly divisible, basically it gets the short end of the stick but without complaining.

>>> chop(range(12),3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11]]
>>> chop(range(12),4)
[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
>>> chop(range(12),5)
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11]]
>>> chop(range(12),6)
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11]]
sente
  • 2,327
  • 2
  • 18
  • 24
1

What about this? It's a different idiom, but produces your desired result:

[x[i:i+k] for i in range(0,len(x),k)] #=> [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14]]
[x[i:i+k] for i in range(0,len(x),k)] #=> [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11], [12, 13, 14]]

Or if you really need tuples, use tuple(x[i:i+k]) instead of just x[i:i+k].

jtbandes
  • 115,675
  • 35
  • 233
  • 266