78

If I do something with list comprehensions, it writes to a local variable:

i = 0
test = any([i == 2 for i in xrange(10)])
print i

This prints "9". However, if I use a generator, it doesn't write to a local variable:

i = 0
test = any(i == 2 for i in xrange(10))
print i

This prints "0".

Is there any good reason for this difference? Is this a design decision, or just a random byproduct of the way that generators and list comprehensions are implemented? Personally, it would seem better to me if list comprehensions didn't write to local variables.

hunse
  • 3,175
  • 20
  • 25
  • 1
    [Bikeshedding? I'm not going to vote close this as a duplicate, but see my answer here](http://stackoverflow.com/a/12381841/674039) – wim Nov 07 '13 at 22:40
  • 5
    @wim: Trying to understand an unintutive part of a language isn’t bikeshedding. – Ry- Nov 07 '13 at 22:49
  • @wim: If by "bikeshedding" you mean procrastination, then you are correct! I was just curious, and it has caused bugs for me in the past. – hunse Nov 07 '13 at 22:58
  • Sorry, it's just something that's been asked and answered several times already. +1 for you anyway, for writing the question well. – wim Nov 07 '13 at 23:03
  • How dare you compare an important programming language semantics concern to the choice of paint color for a bike shed?! – Kaz Nov 08 '13 at 01:11
  • insteresting question! – laike9m Nov 13 '13 at 08:06

6 Answers6

76

Python’s creator, Guido van Rossum, mentions this when he wrote about generator expressions that were uniformly built into Python 3: (emphasis mine)

We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension "leaks" the loop control variable into the surrounding scope:

x = 'before'
a = [x for x in 1, 2, 3]
print x # this prints '3', not 'before'

This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.

However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.

So in Python 3 you won’t see this happen anymore.

Interestingly, dict comprehensions in Python 2 don’t do this either; this is mostly because dict comprehensions were backported from Python 3 and as such already had that fix in them.

There are some other questions that cover this topic too, but I’m sure you have already seen those when you searched for the topic, right? ;)

Community
  • 1
  • 1
poke
  • 369,085
  • 72
  • 557
  • 602
  • :) I did do a brief search for this, but I didn't find those posts, partly because I didn't know what to call the local variable that a list comprehension creates. Is "loop variable" the preferred term? That's what PEP 289 uses, anyway. Would this term also apply to generators, even though they don't really have a formal loop per se? – hunse Nov 07 '13 at 22:53
  • 1
    The [grammar](http://docs.python.org/3/reference/expressions.html#grammar-token-comp_for) calls it “target”, but I guess “loop variable” still makes the most sense. And I would say it applies to generators as well, as—just like full generator functions—they still have a loop inside but just pause until the next iteration is requested. So “iterator variable” works fine too, I’d say :) – poke Nov 07 '13 at 23:02
  • @poke Note that a `target` need not be a variable. For example it can be a tuple (for tuple-unpacking): `[x for x,y in something]`, however you can also do more odd things like: `a = [1,2,3]; [1 for a[0] in range(3)]`, or even: `[1 for something.attribute in iterable]`. – Bakuriu Nov 08 '13 at 08:38
  • 2
    Does this mean that list comprehensions are slower in python 3 compared to python2? – Jens Timmerman Nov 14 '13 at 16:36
  • 5
    @JensTimmerman The directly following paragraph covers this: *“And before you start worrying about list comprehensions becoming slow in Python 3: thanks to the enormous implementation effort that went into Python 3 to speed things up in general, both list comprehensions and generator expressions in Python 3 are actually faster than they were in Python 2!”* – poke Nov 14 '13 at 16:55
16

As PEP 289 (Generator Expressions) explains:

The loop variable (if it is a simple variable or a tuple of simple variables) is not exposed to the surrounding function. This facilitates the implementation and makes typical use cases more reliable.

It appears to have been done for implementation reasons.

Personally, it would seem better to me if list comprehensions didn't write to local variables.

PEP 289 clarifies this as well:

List comprehensions also "leak" their loop variable into the surrounding scope. This will also change in Python 3.0, so that the semantic definition of a list comprehension in Python 3.0 will be equivalent to list().

In other words, the behaviour you describe indeed differs in Python 2 but it has been fixed in Python 3.

Simeon Visser
  • 118,920
  • 18
  • 185
  • 180
  • which doesn't explain why listcomps _do_ expose the variable (and PEP 202 isn't very helpful). i assume it was originally to match the semantics of `for`, and later this was realized to be a bad idea. – Eevee Nov 07 '13 at 22:38
  • "the semantic definition of a list comprehension in Python 3.0 will be equivalent to list()" -- PEP 289. This seemed to me to be the logical way to do a list comprehension, hence my original question. I didn't realize that generators came later. – hunse Nov 07 '13 at 22:45
11

Personally, it would seem better to me if list comprehensions didn't write to local variables.

You are correct. This is fixed in Python 3.x. The behavior is unchanged in 2.x so that it doesn't impact existing code that (ab)uses this hole.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
4

Because... because.

No, really, that's it. Quirk of the implementation. And arguably a bug, since it's fixed in Python 3.

Eevee
  • 47,412
  • 11
  • 95
  • 127
1

As a by-product of wandering how list-comprehensions are actually implemented, I found out a good answer for your question.

In Python 2, take a look at the byte-code generated for a simple list comprehension:

>>> s = compile('[i for i in [1, 2, 3]]', '', 'exec')
>>> dis(s)
  1           0 BUILD_LIST               0
              3 LOAD_CONST               0 (1)
              6 LOAD_CONST               1 (2)
              9 LOAD_CONST               2 (3)
             12 BUILD_LIST               3
             15 GET_ITER            
        >>   16 FOR_ITER                12 (to 31)
             19 STORE_NAME               0 (i)
             22 LOAD_NAME                0 (i)
             25 LIST_APPEND              2
             28 JUMP_ABSOLUTE           16
        >>   31 POP_TOP             
             32 LOAD_CONST               3 (None)
             35 RETURN_VALUE  

it essentially translates to a simple for-loop, that's the syntactic sugar for it. As a result, the same semantics as for for-loops apply:

a = []
for i in [1, 2, 3]
    a.append(i)
print(i) # 3 leaky

In the list-comprehension case, (C)Python uses a "hidden list name" and a special instruction LIST_APPEND to handle creation but really does nothing more than that.

So your question should generalize to why Python writes to the for loop variable in for-loops; that is nicely answered by a blog post from Eli Bendersky.

Python 3, as mentioned and by others, has changed the list-comprehension semantics to better match that of generators (by creating a separate code-object for the comprehension) and is essentially syntactic sugar for the following:

a = [i for i in [1, 2, 3]]

# equivalent to
def __f(it):
    _ = []
    for i in it
        _.append(i)
    return _
a = __f([1, 2, 3])

this won't leak because it doesn't run in the uppermost scope as the Python 2 equivalent does. The i is leaked, only in __f and then destroyed as a local variable to that function.

If you'd want, take a look at the byte-code generated for Python 3 by running dis('a = [i for i in [1, 2, 3]]'). You'll see how a "hidden" code-object is loaded and then a function call is made in the end.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253
0

One of the subtle consequences of the dirty secret described by poke above, is that list(...) and [...] does not have the same side-effects in Python 2:

In [1]: a = 'Before'
In [2]: list(a for a in range(5))
In [3]: a
Out[3]: 'Before'

So no side-effect for generator expression inside list-constructor, but the side-effect is there in a direct list-comprehension:

In [4]: [a for a in range(5)]
In [5]: a
Out[5]: 4
Janus
  • 5,421
  • 2
  • 26
  • 37