18

Last Friday I went to a job interview and had to answer the following question: why does this code raise an exception (UnboundLocalError: local variable 'var' referenced before assignment on the line containing var += 1)?

def outer():
    var = 1

    def inner():
        var += 1
        return var

    return inner

I couldn't give a proper answer; this fact really upset me, and when I came home I tried really hard to find a proper answer. Well, I have found the answer, but now there's something else that confuses me.

I have to say in advance that my question is more about the decisions made when designing the language, not about how it works.

So, consider this code. The inner function is a python closure, and var is not local for outer - it is stored in a cell (and then retrieved from a cell):

def outer():
    var = 1

    def inner():
        return var

    return inner

The disassembly looks like this:

0  LOAD_CONST               1 (1)
3  STORE_DEREF              0 (var)  # not STORE_FAST

6  LOAD_CLOSURE             0 (var)
9  BUILD_TUPLE              1
12 LOAD_CONST               2 (<code object inner at 0x10796c810)
15 LOAD_CONST               3 ('outer.<locals>.inner')
18 MAKE_CLOSURE             0
21 STORE_FAST               0 (inner)

24 LOAD_FAST                0 (inner)
27 RETURN_VALUE

recursing into <code object inner at 0x10796c810:

0  LOAD_DEREF               0 (var)  # same thing
3  RETURN_VALUE

This changes when we try to bind something else to var inside the inner function:

def outer():
    var = 1

    def inner():
        var = 2
        return var

    return inner

Yet again the disassembly:

0  LOAD_CONST               1 (1)
3  STORE_FAST               0 (var)  # this one changed
6  LOAD_CONST               2 (<code object inner at 0x1084a1810)
9  LOAD_CONST               3 ('outer.<locals>.inner')
12 MAKE_FUNCTION            0  # AND not MAKE_CLOSURE
15 STORE_FAST               1 (inner)

18 LOAD_FAST                1 (inner)
21 RETURN_VALUE

recursing into <code object inner at 0x1084a1810:

0  LOAD_CONST               1 (2)
3  STORE_FAST               0 (var)  # 'var' is supposed to be local

6  LOAD_FAST                0 (var)  
9  RETURN_VALUE

We store var locally, which complies to what is said in the documentation: assignments to names always go into the innermost scope.

Now, when we try to make an increment var += 1, a nasty LOAD_FAST shows up, which tries to get var from inner's local scope:

14 LOAD_FAST                0 (var)
17 LOAD_CONST               2 (2)
20 INPLACE_ADD
21 STORE_FAST               0 (var)

And of course we get an error. Now, here is what I don't get: why can't we retrieve var with a LOAD_DEREF, and THEN store it inside inner's scope with a STORE_FAST? I mean, this seems to be O.K. with the "innermost scope" assignment stuff, and in the same time it's somewhat more intuitively desirable. At least the += code would do what we want it to do, and I can't come up with a situation in which the described approach could mess something up.

Can you? I feel that I'm missing something here.

Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
oopcode
  • 1,912
  • 16
  • 26
  • 2
    Do you really think it's intuitively desirable that `var += 1` should copy the closure variable instead of mutating it? Because for me, if this did compile, it's the latter I'd be expecting (and desiring). – abarnert May 10 '15 at 23:47
  • I see your point. My initial motivation was that if you, say, tried to `print(var)` before incrementing, you would still get an `UnboundLocalError` and nothing will be printed. So you don't get a chance to do something to this variable even *before* you make up you mind to `+=`. Although I do understand that this is more about "user's" perspective. – oopcode May 10 '15 at 23:59
  • I see what you're thinking. But basically, Python thinks of the whole scope as something you "make up your mind" to write at once, not as something you write "stream-of-consciousness" style. So, you've already decided that you want a local variable when you do the `print(var)`, even if that local variable isn't bound to anything yet. Maybe the clearest alternative would just be a different error saying that there _is_ no local variable `var` yet, rather than that there is one but it's "unbound", which only really makes sense if you think about what happens under the covers… – abarnert May 11 '15 at 00:04
  • If you *only* print a non-local variable, it will work fine. If you first print and then modify it, you would either be silently modifying a global, or suddenly switching the meaning of `var` from non-local to local. Neither of these seems like a good design choice. – alexis May 11 '15 at 00:17
  • I guess I'm convinced now that current implementation is the correct one. :) – oopcode May 11 '15 at 00:19
  • What does "why can't we" **mean**? Is this question about the rules Python follows in order to compile the code? Or about the design decision behind those rules? Or **just what**? – Karl Knechtel Sep 12 '22 at 11:44
  • Alternatively: the reason why we "can't retrieve var with a LOAD_DEREF, and THEN store it inside inner's scope with a STORE_FAST" is that `LOAD_DEREF` means to load from a different place from where `STORE_FAST` stores. – Karl Knechtel Sep 13 '22 at 15:53

3 Answers3

18

Python has a very simple rule that assigns each name in a scope to exactly one category: local, enclosing, or global/builtin.

(CPython, of course, implements that rule by using FAST locals, DEREF closure cells, and NAME or GLOBAL lookups.)


Your changed rule does make sense for your dead-simple case, but it's easy to come up with cases where it would be ambiguous (at least for a human reader, if not for the compiler). For example:

def outer():
    var = 1

    def inner():
        if spam:
            var = 1
        var += 1
        return var

    return inner

Does that var += 1 do a LOAD_DEREF or LOAD_FAST? We can't know until we know the value of spam at runtime. Which means we can't compile the function body.


Even if you could come up with a more complicated rule that makes sense, there's virtue inherent in the rule being simple. Besides being easy to implement (and therefore easy to debug, optimize, etc.), it's easy for someone to understand. When you get an UnboundLocalError, any intermediate-level Python programmer knows how to work through the rule in his head and figure out what went wrong.


Meanwhile, notice that when this comes up in real-life code, there are very easy ways to work around it explicitly. For example:

def inner():
    lvar = var + 1
    return lvar

You wanted to load the closure variable, and assign to a local variable. There's no reason they need to have the same name. In fact, using the same name is misleading, even with your new rule—it implies to the reader that you're modifying the closure variable, when you really aren't. So just give them different names, and the problem goes away.

And that still works with the nonlocal assignment:

def inner():
    nonlocal var
    if spam:
        var = 1
    lvar = var + 1
    return lvar

Or, of course, there are tricks like using a parameter default value to create a local that starts off with a copy of the closure variable:

def inner(var=var):
    var += 1
    return var
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    Yes, sure, I knew about the `nonlocal` trick. And it seems like your answer is just what I was looking for. – oopcode May 10 '15 at 23:41
  • @oopcode: Something like your design _could_ make sense for a different language. The obvious one is to make variables nonlocal by default and require you to explicitly declare them local. Then you get the question of whether to include global in the lookup, like JavaScript, or make globals special, like Scheme. Either one works, but both feel very different from Python. – abarnert May 10 '15 at 23:46
3

Are you making it too hard? var cannot be local because it's being dereferenced before assignment, and it cannot be non-local (unless declared global or nonlocal) because it's being assigned to.

The language is designed this way so that (a) you don't accidentally stomp on global variables: Assigning to a variable makes it local unless you explicitly declare it global or nonlocal. And (b) you can easily use the values of variables in outer scopes. If you dereference a name you haven't defined locally, it looks for it in enclosing scopes.

Your code must dereference the variable before it can increment it, so the rules of the language make the variable both local and non-local--a contradiction. The result: Your code will only run if you declare var to be nonlocal.

alexis
  • 48,685
  • 16
  • 101
  • 161
  • No, I didn't intend to return `inner()`. The examples are base upon this article: http://tech.blog.aknin.name/2010/06/05/pythons-innards-naming/ – oopcode May 10 '15 at 23:30
  • This seems like it's on track to be the right answer. Globals aren't an issue here; it's local variables vs. closure variables. But it's the same thing: `var` can't be a nonlocal if it's a local. – abarnert May 10 '15 at 23:31
  • Yet again, I seem to understand *why* this doesn't work (the disassembly shows it quite clear), I'm not sure about why is was designed to work this way. – oopcode May 10 '15 at 23:33
  • I expanded on the motivation a little, and made it specific to your closure context. (The reasons are exactly the same as with `global`, but you need `nonlocal` here of course). – alexis May 10 '15 at 23:37
0

You're digging too deep. This is an issue of the language semantics, not opcodes and cells. inner contains an assignment to the name var:

def inner():
    var += 1    # here
    return(var)

so by the Python execution model, inner has a local variable named var, and all attempts to read and write the name var inside inner use the local variable. While Python could have been designed so that if the local var isn't bound, it tries the closure's var, Python wasn't designed that way.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 1
    I think he's asking why the Python execution model was designed that way, instead of a different way. – abarnert May 10 '15 at 23:37