47

I was playing around with list comprehensions to get a better understanding of them and I ran into some unexpected output that I am not able to explain. I haven't found this question asked before, but if it /is/ a repeat question, I apologize.

I was essentially trying to write a generator which generated generators. A simple generator that uses list comprehension would look like this:

(x for x in range(10) if x%2==0) # generates all even integers in range(10)

What I was trying to do was write a generator that generated two generators - the first of which generated the even numbers in range(10) and the second of which generated the odd numbers in range(10). For this, I did:

>>> (x for x in range(10) if x%2==i for i in range(2))
<generator object <genexpr> at 0x7f6b90948f00>

>>> for i in g.next(): print i
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> g = (x for x in range(10) if x%2==i for i in range(2))
>>> g
<generator object <genexpr> at 0x7f6b90969730>
>>> g.next()
Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 1, in <genexpr>
    UnboundLocalError: local variable 'i' referenced before assignment

I don't understand why 'i' is being referenced before assignment

I thought it might have had something to do with i in range(2), so I did:

>>> g = (x for x in range(10) if x%2==i for i in [0.1])
>>> g
<generator object <genexpr> at 0x7f6b90948f00>
>>> g.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <genexpr>
UnboundLocalError: local variable 'i' referenced before assignment

This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:

>>> [x for x in range(10) if x%2==i for i in range(2)]
[1, 1, 3, 3, 5, 5, 7, 7, 9, 9]

which I expected to be the same as:

>>> l = []
>>> for i in range(2):
...     for x in range(10):
...             if x%2==i:
...                     l.append(x)
... 
>>> l
[0, 2, 4, 6, 8, 1, 3, 5, 7, 9] # so where is my list comprehension malformed?

But when I tried it on a hunch, this worked:

>>> [[x for x in range(10) if x%2==i] for i in range(2)]
[[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]] # so nested lists in nested list comprehension somehow affect the scope of if statements? :S

So I thought it might be a problem with what level of scope the if statement operates in. So I tried this:

>>> [x for x in range(10) for i in range(2) if x%2==i]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

And now I'm thoroughly confused. Can someone please explain this behavior. I don't understand why my list comprehensions seem to be malformed, nor do I understand how the scoping of the if statements work.

PS: While proof-reading the question, I realized that this does look a bit like a homework question - it is not.

martineau
  • 119,623
  • 25
  • 170
  • 301
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241

4 Answers4

47

you need to use some parentheses:

((x for x in range(10) if x%2==i) for i in range(2))

This didn't make sense to me, so I thought it best to try something simpler first. So I went back to lists and tried:

[>>> [x for x in range(10) if x%2==i for i in range(2)] [1, 1, 3, 3, 5, 5, 7, 7, 9, 9]

That worked because a previous list comprehension leaks the i variable to the enclosing scope, and become the i for the current one. Try starting a fresh python interpreter, and that would fail due to NameError. The counter's leaking behavior has been removed in Python 3.

EDIT:

The equivalent for loop for:

(x for x in range(10) if x%2==i for i in range(2))

would be:

l = []
for x in range(10):
    if x%2 == i:
        for i in range(2):
            l.append(x)

which also gives a name error.

EDIT2:

the parenthesed version:

((x for x in range(10) if x%2==i) for i in range(2))

is equivalent to:

li = []
for i in range(2):
    lx = []
    for x in range(10):
        if x%2==i:
            lx.append(x)
    li.append(lx)
Lie Ryan
  • 62,238
  • 13
  • 100
  • 144
  • Thank you, I understand the leak error, but why are the parentheses required? What does the code without parentheses translate into (in terms of a for loop)? I understand that parentheses fix the problem, I just don't get why – inspectorG4dget Sep 22 '10 at 06:40
  • Unfortunately depending on just how you consume the result the first generator can give even numbers, odd numbers, or some even followed by some odd. – Duncan Sep 22 '10 at 08:06
  • @asmoore82: it didn't change, it had always returned a generator. It's just in the "equivalent" version, I preemptively unrolled the generator to make things easier to understand. – Lie Ryan Jun 29 '18 at 05:43
  • oh wow, thanks for the reply, I removed my comment because I misread that the OP specifically asked for a generator of generators – asmoore82 Jun 29 '18 at 06:21
10

Lie Ryan's for-loop equivalent leads me to the following, which does seem to work just fine:

[x for i in range(2) for x in range(10) if i == x%2]

outputs

[0, 2, 4, 6, 8, 1, 3, 5, 7, 9]
Nathan Kronenfeld
  • 473
  • 1
  • 5
  • 14
7

Expanding on Lie Ryan's answer a bit:

something = (x for x in range(10) if x%2==i for i in range(2))

is equivalent to:

def _gen1():
    for x in range(10):
        if x%2 == i:
            for i in range(2):
                yield x
something = _gen1()

whereas the parenthesised version is equivalent to:

def _gen1():
    def _gen2():
        for x in range(10):
            if x%2 == i:
                yield x

    for i in range(2):
        yield _gen2()
something = _gen1()

This does actually yield the two generators:

[<generator object <genexpr> at 0x02A0A968>, <generator object <genexpr> at 0x02A0A990>]

Unfortunately the generators it yields are somewhat unstable as the output will depend on how you consume them:

>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in gens:
        print(list(g))

[0, 2, 4, 6, 8]
[1, 3, 5, 7, 9]
>>> gens = ((x for x in range(10) if x%2==i) for i in range(2))
>>> for g in list(gens):
        print(list(g))

[1, 3, 5, 7, 9]
[1, 3, 5, 7, 9]

My advice is to write the generator functions out in full: I think trying to get the correct scoping on i without doing that may be all but impossible.

Community
  • 1
  • 1
Duncan
  • 92,073
  • 11
  • 122
  • 156
  • 1
    Why is it that wrapping the generator in `list()` causes it to start at 1 rather than 0? (i tested it with more generators, looks like the generators are all doing the same thing) – Steven Lu Jun 05 '13 at 15:31
  • When consumed the generators always do the same thing: they give you numbers where `x%2==i`. The first way sets `i` to `0` then yields a generator which is consumed to give the even numbers, then `i` becomes 1 and another generator returned which gives you odd numbers, N.B. it was the same `i` each time just with different values. Using `list(gens)` means that both generators are created so `i` has been set to `1` before you consume either generator. – Duncan Jun 06 '13 at 08:33
  • Why is it that the `i` is not shared in the first case? – Steven Lu Jun 06 '13 at 16:20
  • Oh. I think it's that `i` is always shared, but in the first case the gens are run in tandem with the outer loop while in the second case the gens are run after the outer loop finishes so they both get the final value of i. Why is `i` not a copy, i.e. the generators acting like closures? – Steven Lu Jun 06 '13 at 16:25
5

Lie has the answer to the syntactical question. A suggestion: don't stuff so much into the body of a generator. A function is much more readable.

def make_generator(modulus):
    return (x for x in range(10) if x % 2 == modulus)
g = (make_generator(i) for i in range(2))
Glenn Maynard
  • 55,829
  • 10
  • 121
  • 131
  • 2
    Thank you. I understand this and I do prefer this. But I was trying to get some practice with list comprehensions and see how far I could push them – inspectorG4dget Sep 22 '10 at 06:41