4

I was studying the response from Nosklo in What is the most "pythonic" way to iterate over a list in chunks?, where he defined the function:

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))

Can someone explains me how the return works when is followed by a for loop? I tried to do the following:

def chunker2(seq, size):
    for pos in xrange(0, len(seq), size):
    return seq[pos:pos + size]

but I don't get the same result. Note than in Nosklo example, chunker() is called iteratively like in the example below:

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']
for group in chunker(animals, 3):
    print group

By adding prints, I noticed that the latter for loop is executed 3 times (it goes trhough the animals list), but the for loop in chunker function is executed only once. So, how come there is only one return and I can see 3 prints?

Community
  • 1
  • 1
user3764177
  • 59
  • 1
  • 3
  • https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions – Bakuriu Jun 22 '14 at 06:22
  • 3
    That's not a for loop; it's a generator expression. It just looks a lot like a for loop, which makes newbies constantly think it's one. See http://stackoverflow.com/questions/1756096/understanding-generators-in-python and http://legacy.python.org/dev/peps/pep-0289/ – user2357112 Jun 22 '14 at 06:24
  • Thank you very much! I'll check those links. – user3764177 Jun 22 '14 at 06:25
  • 2
    +1 for asking and reasonably presenting what the user understands so far. i wish i could give another +1 for trying to learn the most "pythonic" way to do things. good luck learning python and with more questions on stackoverflow. – necromancer Jun 22 '14 at 06:28
  • I am not sure I fully understood. So, chunker is the generator, right? In my example, it is called 3 times, and every time it return a different value. The 1st time, group is [cat, dog, rabbit]. My problem is for the following iterations: next time chunker is called, how "pos" becomes the 4th index of the list? My only explanation is that the complete segmentation of the list is done once, and that every time I call chunker again I get a new segment. Is my understanding (more or less) correct? Thanks! – user3764177 Jun 22 '14 at 06:55
  • @user3764177 `chunker` is not a generator itself, it's return value is. And every time you iterate on this return value, it will give you another chunk of the list. – famousgarkin Jun 22 '14 at 07:04
  • I see. But when I add a print in the chunker definition, I notice that it is actually called only once, not 3 times... as the print appears only 1 time. Why? Shouldn't it be printing 3 times, every time chunker is called from the bottom for loop? – user3764177 Jun 22 '14 at 07:08
  • @user3764177 That's the very point, `chunker` function is called only once, when the `for` loop is evaluated, and creates and returns the generator object. The `for` loop then iterates using this returned generator object, not the `chunker` function. I'll try to clear this up in the answer. – famousgarkin Jun 22 '14 at 07:12

1 Answers1

3

The return value of the nosklo chunker function is called a generator, an object that will generate values when being iterated. In this case the generator is created using a generator expression, an indivisible piece of code between the parentheses: (seq[pos:pos + size] for pos in xrange(0, len(seq), size)).

>>> def chunker(seq, size):
...     return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))
... 
>>> result = chunker([1, 2, 3], 2)
>>> print(result)
<generator object <genexpr> at 0x10581e1e0>
>>> next(result)
[1, 2]
>>> next(result)
[3]

Regarding what gets called, we can rewrite the loop code like this to see it better:

>>> generator = chunker(animals, 3)
>>> for chunk in generator:
...     print chunk
... 
['cat', 'dog', 'rabbit']
['duck', 'bird', 'cow']
['gnu', 'fish']    

The chunker functions gets called only once and returns the generator object, which we store in the generator variable. The for loop then only works with this generator object and calls it 3 times.

To be able to print the actual calls of this generator, you would have to include the print statement inside the generator expression (seq[pos:pos + size] for pos in xrange(0, len(seq), size)), which is not allowed. But we can rewrite this generator expression to a normal generator function using the yield statement, a more verbose but also more versatile form of generator where we can include the print statement and which will work as you expected initially:

>>> def chunker2(seq, size):
...     for pos in xrange(0, len(seq), size):
...         print('chunker2 generator called')
...         yield seq[pos:pos + size]
... 
>>> for group in chunker2(animals, 3):
...     print group
... 
chunker2 generator called
['cat', 'dog', 'rabbit']
chunker2 generator called
['duck', 'bird', 'cow']
chunker2 generator called
['gnu', 'fish']

Here the chunker2 function itself is the actual generator and gets called 3 times.

Community
  • 1
  • 1
famousgarkin
  • 13,687
  • 5
  • 58
  • 74
  • Thanks. So this means that in Nosklo example, the "next" correspond to the different iterations in the loop for group in chunker(animals, 3): – user3764177 Jun 22 '14 at 07:01
  • @user3764177 Correct, the `next` function gets the next item from the iterable object just the same as one iteration of the `for` loop. – famousgarkin Jun 22 '14 at 07:08