6

I am trying to understand the behaviour of the yield statement by building a generator which behaves similarly to the 'enumerate' built-in function but I am witnessing inconsistencies depending on how I iterate through it.

def enumerate(sequence, start=0):
n = start
for elem in sequence:
    print("Before the 'yield' statement in the generator, n = {}".format(n))
    yield n, elem
    n += 1
    print("After the 'yield' statement in the generator, n = {}".format(n))

My understanding of generators is that the execution of the code will stop once a yield statement has been reached, upon which it returns a value. This matches what I get with the script below.

a = 'foo'
b = enumerate(a)
n1,v1 = next(b)
print('n1 = {}, v1 = {}\n'.format(n1,v1))
n2,v2 = next(b)
print('n2 = {}, v2 = {}'.format(n2,v2))

In this case, the generator seems to stop exactly at the yield statement and resumes in the n+=1 one with the second 'next' statement:

Before the 'yield' statement in the generator, n = 0
n1 = 0, v1 = f

After the 'yield' statement in the generator, n = 1
Before the 'yield' statement in the generator, n = 1
n2 = 1, v2 = o

However, if I use the for loop below, the generator does not seem to stop at the yield statement.

for n,v in enumerate(a[0:1]):
    print('n = {}, v = {}'.format(n,v))

This is what I get:

Before the 'yield' statement in the generator, n = 0
n = 0, v = f
After the 'yield' statement in the generator, n = 1

Edit taking comments into account

I realise I'm iterating over just one element, but I was not expecting to see the very last "After the 'yield' statement in the generator" sentence (which appears even if I iterate over ALL the elements.

print('\n\n')
for n,v in enumerate(a):
    print('n = {}, v = {}'.format(n,v))

Before the 'yield' statement in the generator, n = 0
n = 0, v = f
After the 'yield' statement in the generator, n = 1
Before the 'yield' statement in the generator, n = 1
n = 1, v = o
After the 'yield' statement in the generator, n = 2
Before the 'yield' statement in the generator, n = 2
n = 2, v = o
After the 'yield' statement in the generator, n = 3

Why does this happen?

blackcorsair
  • 63
  • 1
  • 5
  • 1
    ... Because you're iterating over one element. – Ignacio Vazquez-Abrams Jun 18 '18 at 14:16
  • The output is totally consistent with expectation. What's the problem? – Mad Physicist Jun 18 '18 at 14:18
  • `print(a[0:1])` should clarify – Mad Physicist Jun 18 '18 at 14:20
  • If a `for` loop stopped after the first value was yielded, it wouldn't be much of a loop, would it? The loop keeps going until nothing's left. – Aran-Fey Jun 18 '18 at 14:20
  • Hi, @MadPhysicist, maybe the problem is that I don't know exactly what the expectation is. I was hoping that the very last "After the 'yield' statement" sentence would not be printed, and I don't understand why it is indeed printed. Would you mind pointing me in the right direction? – blackcorsair Jun 18 '18 at 14:23
  • Hi, @Ignacio Vazquez-Abrams, thanks a lot for your answer, but I am still confused. If I iterate over two elements (or all of them, for that matter) I get the same behaviour; the generator does not stop at the yield and prints the last "After the 'yield'..." sentence – blackcorsair Jun 18 '18 at 14:26
  • How do you expect the `for` loop to know when to stop? It's literally impossible for it to stop when you expect it to, because it has no way to know what the last element is without running your code until it no longer yields anything. – Aran-Fey Jun 18 '18 at 14:29
  • Hi, @Aran-Fey, sorry for being a bit thick, but if the generator must be invoked a fourth time to reach the 'yield' statement and realise it cannot longer yield anything, would it not reach the first print statement (the one which prints "Before the yield statement") and print that one too? – blackcorsair Jun 18 '18 at 14:40
  • I think the idea of "realize" is what's tripping you up. The interpreter does not actually interpret in the colloquial sense. It does not "realize" you are done until you explicitly tell it so. That's sort of the whole thing with programming in the first place. – Mad Physicist Jun 18 '18 at 16:14

2 Answers2

4

the answer lies in understanding what for loop in python does: It get the iterator (i.e. iter()) of an object and continues until a StopIteration exception is raised. StopIteration exception is thrown when the code of the generator is done, meaning getting the return statement which exists the function (could be implicit also). This is why it doesn't stops at yield, it keeps asking for the next yield until the generator is done.

Omri Levi
  • 177
  • 5
4

The fundamental issue here is that you are confusing the fact that you know when the generator will be exhausted just by looking at it, with the fact that Python can only know by running the code. When Python reaches the yield that you consider to be the last one, it does not actually know that it is the last one. What if your generator looked like this:

def enumeratex(x, start=0):
    for elem in x:
        yield start, x
        start += 1
    yield start, None

Here, for reasons no one will ever know, a final None element is returned after the main generator loop. Python would have no way of knowing that the generator is done until you either

  1. Return from the generator.
  2. Raise an error, in which case everything will grind to a halt.

In versions before Python 3.7, generators could raise StopIteration to indicate termination. In fact, a return statement would be equivalent to either raise StopIteration (if returning None) or raise StopIteration(return_value).

So while the exact manner in which you tell Python to end the generator is up to you, you do have to be explicit about it. A yield does not by itself end the generator.

TL;DR

All of the code in a loop in a generator will always run, even after the last value has been yielded because Python can only know it was the last value by actually executing all the code.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • 2
    The first point in your numbered list is no longer true (it did used to be). In Python 3.7 (or 3.5+ with `from __future__ import generator_stop`), a `StopIteration` that is uncaught in a generator will be converted into a `RuntimeError`. In new code you should always `return` from a generator when you're done (though it's still fine to return `None` implicitly by running off the end of the function). – Blckknght Jun 18 '18 at 16:38
  • @Blckknght. Thanks for the catch. Updated. – Mad Physicist Jun 18 '18 at 17:17