14
a = 10
def f():
  print(1)
  print(a) # UnboundLocalError raised here
  a = 20
f()

This code of course raises UnboundLocalError: local variable 'a' referenced before assignment. But why is this exception raised at the print(a) line?

If the interpreter executed code line by line (like I thought it did), it wouldn't know anything was wrong when print(a) was reached; it would just think that a referred to the global variable.

So it appears the interpreter reads the entire function in advance to figure out whether a is used for assignment. Is this documented anywhere? Is there any other occasion where the interpreter looks ahead (apart from checking for syntax errors)?

To clarify, the exception itself is perfectly clear: global variables can be read without global declaration, but not written (this design prevents bugs due to unintentionally modifying global variables; those bugs are especially hard to debug because they lead to errors that occur far from the location of the erroneous code). I'm just curious why the exception is raised early.

user
  • 5,370
  • 8
  • 47
  • 75
max
  • 49,282
  • 56
  • 208
  • 355
  • 1
    I think you'll find that python cannot access global variables from inside functions by default. You will have to explicitly declare that you mean to use a global. (btw Don't Use Global Variables). – quamrana Nov 26 '16 at 19:13
  • 6
    @quamrana untrue. If you remove the assignment to the local `a` the code prints `10`. – 2rs2ts Nov 26 '16 at 19:14
  • For what it's worth, this isn't specific to python 3, just tested in python 2 and the same thing happens. – 2rs2ts Nov 26 '16 at 19:16
  • 4
    I think you'll find that python cannot (**edit**) access *write to* global variables from inside functions by default. You will have to explicitly declare that you mean to use a global. (btw Don't Use Global Variables). – quamrana Nov 26 '16 at 19:29
  • 4
    There are many similar/duplicates like [this](http://stackoverflow.com/a/17097379/4230591) but since this Q doesn't have redundant code, i think it shouldn't be closed. – user Nov 26 '16 at 19:30
  • The short answer to this question is that local variables for each function are determined _statically at compile time_. – Sven Marnach Nov 27 '16 at 20:53

2 Answers2

14

According to Python's documentation, the interpreter will first notice an assignment for a variable named a in the scope of f() (no matter the position of the assignment in the function) and then as a consequence only recognize the variable a as a local variable in this scope. This behavior effectively shadows the global variable a.

The exception is then raised "early", because the interpreter which executes the code "line by line", will encounter the print statement referencing a local variable, which is not yet bound at this point (remember, Python is looking for a local variable here).

As you mentioned in your question, one has to use the global keyword to explicitly tell the compiler that the assignment in this scope is done to the global variable the correct code would be:

a = 10
def f():
  global a
  print(1)
  print(a) # Prints 10 as expected
  a = 20
f()

As @2rs2ts said in a now-deleted answer, this is easily explained by the fact that "Python is not merely interpreted, it is compiled into a bytecode and not just interpreted line by line".

Antoine
  • 3,880
  • 2
  • 26
  • 44
  • Does the fact that the script is compiled into bytecode first rather than interpreted line by line has any other (slightly) unexpected consequences? – max Nov 26 '16 at 19:34
  • Not that I'm aware of off the top of my head, but I'm probably wrong. Something I know is that python is not optimized much by the compiler because the language is very dynamic (no static types, methods can be replaced on the fly, etc) so it makes it very difficult. Generally speaking, the code is compiled to bytecode only to remove the parsing time at each function call (the runtime just looks up the bytecodes in a big lookup table, instead of parsing everything again). cc @max – Antoine Nov 26 '16 at 19:45
  • 1
    @max: Compilation and interpretation are completely and utterly irrelevant to your question. You are asking about *Semantics*, i.e. the Python Language Specification. The Python Language Specification says that all variables assigned in a function are local unless declared `global`. Period. Compilation and interpretation are question of *Pragmatics*, i.e. any particular Python implementation. But any particular Python implementation, regardless of whether it is compiled or interpreted must abide by the Python Language Specification. Otherwise it wouldn't *be* a "Python implementation", it … – Jörg W Mittag Nov 27 '16 at 06:27
  • 1
    … would be an implementation of a completely different language that kinda-sorta resembles Python. Even if there weren't any Python implementations *at all*, if Python only existed on paper (or even only in Guido van Rossum's head), the behavior would still be the same one you are seeing, because the Python Language Specification says that the behavior you *are* seeing is the one you *should* be seeing. Don't confuse the programming language "Python" with the implementation "CPython". E.g.: the behavior you are seeing is part of "Python" and thus identical in *all* implementations (Pyston, … – Jörg W Mittag Nov 27 '16 at 06:28
  • 1
    … PyPy, IronPython, Jython, CPython, Pynie, …), whereas reference counting and deterministic finalization, the GIL, the C extension API are party of "CPython" and don't necessarily exist in other implementations. – Jörg W Mittag Nov 27 '16 at 06:31
  • @JörgWMittag ah agreed. I just was curious how this rule could be implemented. And now of course everything makes sense. – max Nov 27 '16 at 08:44
9

In the Resolution of Names section of the Python Reference Manual this is stated:

[..] If the current scope is a function scope, and the name refers to a local variable that has not yet been bound to a value at the point where the name is used, an UnboundLocalError exception is raised [..]

that's the official word on when UnboundLocalError occurs. If you take a look at the bytecode CPython generates for your function f with dis you can see it trying to load the name from a local scope when its value hasn't even been set yet:

>>> dis.dis(f)
  3           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 (1)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP

  4          10 LOAD_GLOBAL              0 (print)
             13 LOAD_FAST                0 (a)      # <-- this command
             16 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             19 POP_TOP

  5          20 LOAD_CONST               2 (20)
             23 STORE_FAST               0 (a)
             26 LOAD_CONST               0 (None)
             29 RETURN_VALUE

As you can see, the name 'a' is loaded onto the stack by means of the LOAD_FAST command:

             13 LOAD_FAST                0 (a)

This is the command that is used to grab local variables in a function (named FAST due to it being quite faster than loading from the global scope with LOAD_GLOBAL).

This really has nothing to do with the global name a that has been defined previously. It has to do with the fact that CPython will assume you're playing nice and generate a LOAD_FAST for references to 'a' since 'a' is being assigned to (i.e made a local name) inside the function body.

For a function with a single name access and no corresponding assignment, CPython does not generate a LOAD_FAST and instead goes and looks at the global scope with LOAD_GLOBAL:

>>> def g():
...    print(b)
>>> dis.dis(g)
  2           0 LOAD_GLOBAL              0 (print)
              3 LOAD_GLOBAL              1 (b)
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 POP_TOP
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE

So it appears the interpreter reads the entire function in advance to figure out whether a is used for assignment. Is this documented anywhere? Is there any other occasion where the interpreter looks ahead (apart from checking for syntax errors)?

In the Compound Statements section of the reference manual the following is stated for function definitions:

A function definition is an executable statement. Its execution binds the function name in the current local namespace to a function object (a wrapper around the executable code for the function).

Specifically, it binds the name f to a function object that holds the compiled code, f.__code__, that dis prettifies for us.

Dimitris Fasarakis Hilliard
  • 150,925
  • 31
  • 268
  • 253