4

Consider these two functions:

def foo():
    x = 0
    while True:
        yield x
        x += 1

def wrap_foo(limit=10, gen=True):
    fg = foo()
    count = 0
    if gen:
        while count < limit:
            yield next(fg)
            count += 1
    else:
        return [next(fg) for _ in range(limit)]=

foo() is a generator, and wrap_foo() just puts a limit on how much data gets generated. I was experimenting with having the wrapper behave as a generator with gen=True, or as a regular function that puts all generated data into memory directly with the kwarg gen=False.

The regular generator behavior works as I'd expect:

In [1352]: [_ for _ in wrap_foo(gen=True)]
Out[1352]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

However, with gen=False, nothing gets generated.

In [1351]: [num for num in wrap_foo(gen=False)]
Out[1351]: []

It seems like Python pre-classifies the function as a generator based on the presence of the yield statement (latter example works perfectly if yield is commented out).

Why is this? I would like to understand the mechanisms at play here. I'm running 3.6

wim
  • 338,267
  • 99
  • 616
  • 750
zachd1_618
  • 4,210
  • 6
  • 34
  • 47
  • If there is a `yield` in your `def` body, the function will *always be a generator*. A `return` will then act as an implicit `StopIteration`, not as a typical `return` value. Just use `list(wrap_foo(10))` if you want to load the whole thing into memory. Why would you want to do it any other way? – juanpa.arrivillaga Mar 01 '17 at 19:48
  • That's what i figured. I just got lazy at one point in my interactive shell and tried to add a kwarg so I could just get generated data directly instead of always calling `[_ for _ in ...` Then I got curios as to why I couldn't do that. – zachd1_618 Mar 01 '17 at 19:54
  • But you don't *need* to call `[_ for _ in ...]`, you've abstracted that logic into a generator, so to materialize it just use `list` – juanpa.arrivillaga Mar 01 '17 at 19:55
  • 1
    Very true. I was just being dramatic ;-) – zachd1_618 Mar 01 '17 at 19:57

2 Answers2

7

It seems like Python pre-classifies the function as a generator based on the presence of the yield statement

Yes, that's exactly what happens. wrap_foo is determined to be a generator at function definition time. You could consider using generator expressions instead:

def wrap_foo(limit=10, gen=True):
    fg = foo()
    if gen:
        return (next(fg) for _ in range(limit))
    else:
        return [next(fg) for _ in range(limit)]
wim
  • 338,267
  • 99
  • 616
  • 750
3

It seems like Python pre-classifies the function as a generator based on the presence of the yield statement (latter example works perfectly if yield is commented out).

Why is this?

Because Python can't wait until the function actually executes a yield to decide whether it's a generator. First, generators are defined to not execute any of their code until the first next. Second, a generator might never actually reach any of its yield statements, if it happens to not generate any elements.

user2357112
  • 260,549
  • 28
  • 431
  • 505