131

Comprehensions show unusual interactions with scoping. Is this the expected behavior?

x = "original value"
squares = [x**2 for x in range(5)]
print(x)  # Prints 4 in Python 2!

At the risk of whining, this is a brutal source of errors. As I write new code, I just occasionally find very weird errors due to rebinding -- even now that I know it's a problem. I need to make a rule like "always preface temp vars in list comprehensions with underscore", but even that's not foolproof. The fact that there's this random time-bomb waiting kind of negates all the nice "ease of use" of list comprehensions.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
Jabavu Adams
  • 2,348
  • 3
  • 18
  • 16
  • 7
    -1: "brutal source of errors"? Hardly. Why choose such an argumentative term? Generally the most expensive errors are requirements misunderstandings and simple logic errors. This kind of error has been a standard problem in a lot of programming languages. Why call it 'brutal'? – S.Lott Nov 16 '10 at 22:54
  • 51
    It violates the principle of least surprise. It's also not mentioned in the python documentation on list comprehensions which does however mention several times how easy and convenient they are. Essentially it's a land-mine that existed outside my language model, and hence was impossible for me to foresee. – Jabavu Adams Nov 18 '10 at 05:35
  • 40
    +1 for "brutal source of errors". The word 'brutal' is *entirely* justified. – N. Virgo Feb 20 '13 at 09:07
  • 8
    Note: the documention **does** state that list-comprehension are equivalent to the explicit `for`-loop construct and *`for`-loops leak variables*. So it wasn't explicit but was implicitly stated. – Bakuriu Jul 04 '15 at 10:59
  • 4
    @Bakuriu Explicit is better than implicit. – 0xc0de May 24 '17 at 08:45

6 Answers6

189

List comprehensions leak the loop control variable in Python 2 but not in Python 3. Here's Guido van Rossum (creator of Python) explaining the history behind this:

We also made another change in Python 3, to improve equivalence between list comprehensions and generator expressions. In Python 2, the list comprehension "leaks" the loop control variable into the surrounding scope:

x = 'before'
a = [x for x in 1, 2, 3]
print x # this prints '3', not 'before'

This was an artifact of the original implementation of list comprehensions; it was one of Python's "dirty little secrets" for years. It started out as an intentional compromise to make list comprehensions blindingly fast, and while it was not a common pitfall for beginners, it definitely stung people occasionally. For generator expressions we could not do this. Generator expressions are implemented using generators, whose execution requires a separate execution frame. Thus, generator expressions (especially if they iterate over a short sequence) were less efficient than list comprehensions.

However, in Python 3, we decided to fix the "dirty little secret" of list comprehensions by using the same implementation strategy as for generator expressions. Thus, in Python 3, the above example (after modification to use print(x) :-) will print 'before', proving that the 'x' in the list comprehension temporarily shadows but does not override the 'x' in the surrounding scope.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • 18
    I'll add that although Guido calls it a "dirty little secret", many considered it a feature, not a bug. – Steven Rumbalski Nov 10 '11 at 20:32
  • 40
    Also note that now in 2.7, set and dictionary comprehensions (and generators) have private scopes, but list comprehensions still don't. While this makes some sense in that the former were all back-ported from Python 3, it really makes the contrast with list comprehensions jarring. – mbauman Nov 27 '11 at 20:47
  • 11
    I know this is an insanely old question, but _why_ did some consider it a feature of the language? Is there anything in favour of this kind of variable leaking? – Mathias Müller Aug 28 '15 at 11:32
  • 1
    ugh just got bit by this...leaking this was definitely a surprise IMO. I guess not relevant anymore, but felt the need to mention :P – JPC Dec 16 '15 at 20:16
  • 4
    **for: loops** leaking has good reasons, esp. to access last value after early `break` — but irrelevant to comprehesions. I recall some comp.lang.python discussions where people wanted to assign variables in middle of expression. The *less insane* way found was single-value for clauses eg. `sum100 = [s for s in [0] for i in range(1, 101) for s in [s + i]][-1]`, but just needs a comprehension-local var and works just as well in Python 3. I think "leaking" was the only way to set variable visible outside an expression. Everybody agreed these techniques are horrible :-) – Beni Cherniavsky-Paskin Aug 11 '16 at 08:57
  • 1
    @Mathias, The leak gave list comprehensions the same scope rules as nested/lambda functions, which can also access variables in their definition scope, and so made list expressions equally powerful. – Rian Rizvi Feb 15 '17 at 16:23
  • 3
    The problem here is not having access to the surrounding scope of the list comprehensions, but binding in the list comprehensions scope affecting the surrounding scope. – Felipe Gonçalves Marques Sep 28 '18 at 10:25
  • 1
    Just a note: `a = [x for x in 1, 2, 3]` is valid Python 2, but should be replaced with `a = [x for x in (1, 2, 3)]` in the Python 3 version. – joseville Oct 13 '21 at 17:32
  • 1
    @joseville since it's a quote, I will leave it as is. – Steven Rumbalski Oct 13 '21 at 20:30
52

Yes, list comprehensions "leak" their variable in Python 2.x, just like for loops.

In retrospect, this was recognized to be a mistake, and it was avoided with generator expressions. EDIT: As Matt B. notes it was also avoided when set and dictionary comprehension syntaxes were backported from Python 3.

List comprehensions' behavior had to be left as it is in Python 2, but it's fully fixed in Python 3.

This means that in all of:

list(x for x in a if x>32)
set(x//4 for x in a if x>32)         # just another generator exp.
dict((x, x//16) for x in a if x>32)  # yet another generator exp.
{x//4 for x in a if x>32}            # 2.7+ syntax
{x: x//16 for x in a if x>32}        # 2.7+ syntax

the x is always local to the expression while these:

[x for x in a if x>32]
set([x//4 for x in a if x>32])         # just another list comp.
dict([(x, x//16) for x in a if x>32])  # yet another list comp.

in Python 2.x all leak the x variable to the surrounding scope.


UPDATE for Python 3.8: PEP 572 introduced := assignment operator that deliberately leaks out of comprehensions and generator expressions! This leaking was motivated by essentially 2 use cases: capturing a "witness" from early-terminating functions like any() and all():

if any((comment := line).startswith('#') for line in lines):
    print("First comment:", comment)
else:
    print("There are no comments")

and updating mutable state:

total = 0
partial_sums = [total := total + v for v in values]

See Appendix B for exact scoping. The variable is assigned in closest surrounding def or lambda, unless that function declares it nonlocal or global.

Beni Cherniavsky-Paskin
  • 9,483
  • 2
  • 50
  • 58
  • 1
    `:=` can also be used in a list comprehension to [apply the map before the filter without repeating the calculation](https://stackoverflow.com/questions/44988861). – Karl Knechtel Aug 19 '22 at 12:30
  • Right, the := operator has many other uses; reworded the "2 use cases" sentence to clarify it refers to motivations _why they made it leak_. – Beni Cherniavsky-Paskin Sep 02 '22 at 14:25
8

Yes, assignment occurs there, just like it would in a for loop. No new scope is being created.

This is definitely the expected behavior: on each cycle, the value is bound to the name you specify. For instance,

>>> x=0
>>> a=[1,54,4,2,32,234,5234,]
>>> [x for x in a if x>32]
[54, 234, 5234]
>>> x
5234

Once that's recognized, it seems easy enough to avoid: don't use existing names for the variables within comprehensions.

JAL
  • 21,295
  • 1
  • 48
  • 66
2

Interestingly this doesn't affect dictionary or set comprehensions.

>>> [x for x in range(1, 10)]
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> x
9
>>> {x for x in range(1, 5)}
set([1, 2, 3, 4])
>>> x
9
>>> {x:x for x in range(1, 100)}
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 21, 22: 22, 23: 23, 24: 24, 25: 25, 26: 26, 27: 27, 28: 28, 29: 29, 30: 30, 31: 31, 32: 32, 33: 33, 34: 34, 35: 35, 36: 36, 37: 37, 38: 38, 39: 39, 40: 40, 41: 41, 42: 42, 43: 43, 44: 44, 45: 45, 46: 46, 47: 47, 48: 48, 49: 49, 50: 50, 51: 51, 52: 52, 53: 53, 54: 54, 55: 55, 56: 56, 57: 57, 58: 58, 59: 59, 60: 60, 61: 61, 62: 62, 63: 63, 64: 64, 65: 65, 66: 66, 67: 67, 68: 68, 69: 69, 70: 70, 71: 71, 72: 72, 73: 73, 74: 74, 75: 75, 76: 76, 77: 77, 78: 78, 79: 79, 80: 80, 81: 81, 82: 82, 83: 83, 84: 84, 85: 85, 86: 86, 87: 87, 88: 88, 89: 89, 90: 90, 91: 91, 92: 92, 93: 93, 94: 94, 95: 95, 96: 96, 97: 97, 98: 98, 99: 99}
>>> x
9

However it has been fixed in 3 as noted above.

Chris Travers
  • 25,424
  • 6
  • 65
  • 182
  • That syntax doesn't work at all in Python 2.6. Are you talking about Python 2.7? – Paul Hollingsworth Jan 20 '17 at 14:31
  • Python 2.6 has list comprehensions only as does Python 3.0. 3.1 added set and dictionary comprehensions and these were ported to 2.7. Sorry if that was not clear. It was meant to note a limitation to another answer, and which versions it applies to is not entirely straightforward. – Chris Travers Jan 20 '17 at 14:58
  • While I can imagine making an argument that there are cases where using python 2.7 for new code makes sense, I can't say the same for python 2.6... Even if 2.6 is what came with your OS, you're not stuck with it. Consider installing virtualenv and using 3.6 for new code! – Alex L Feb 01 '17 at 15:47
  • The point about Python 2.6 could come up though in maintaining existing legacy systems. So as an historical note it is not totally irrelevant. Same with 3.0 (ick) – Chris Travers Feb 02 '17 at 17:45
  • Sorry if I sound rude, but this doesn't answer the question in any way. It's better suited as a comment. – 0xc0de May 24 '17 at 08:50
  • How would that format as a comment? Would you prefer I put together a blog post and link to it in a comment? Sorry. I don't want to sound rude, but I am trying to understand the practical tradeoff here and what exactly you would prefer this to look like. – Chris Travers May 24 '17 at 12:21
1

some workaround, for python 2.6, when this behaviour is not desirable

# python
Python 2.6.6 (r266:84292, Aug  9 2016, 06:11:56)
Type "help", "copyright", "credits" or "license" for more information.
>>> x=0
>>> a=list(x for x in xrange(9))
>>> x
0
>>> a=[x for x in xrange(9)]
>>> x
8
-2

In python3 while in list comprehension the variable is not getting change after it's scope over but when we use simple for-loop the variable is getting reassigned out of scope.

i = 1 print(i) print([i in range(5)]) print(i) Value of i will remain 1 only.

Now just use simply for loop the value of i will be reassigned.