6

I hope it's not a duplicate (and at the same time it's difficult to tell, given the amount of questions with such errors, but which are basic mistakes), but I don't understand what happens here.

def f():
    c = ord('a')

f()

runs, no error (ord converts character to ASCII code, it's a built-in). Now:

if False:
    ord = None
def f():
    c = ord('a')

f()

Also runs, no error (ord isn't overwritten, condition is always false). Now:

def f():
    if False:
        ord = None
    c = ord('a')

f()

I get (at line where c = ord('a'))

UnboundLocalError: local variable 'ord' referenced before assignment

It seems that just referencing a left side operand makes it a local variable, even if the code is not run.

Obviously I can workaround this, but I was very surprised, given that the dynamic aspect of python allows you to define a variable like being an integer, and at the next line define it as a string.

It seems related to What's the scope of a variable initialized in an if statement?

Apparently the interpreter still takes notes of unreached branches when compiling to bytecode, but what happens exactly?

(tested on Python 2.7 and Python 3.4)

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 1
    This has been asked many times. Local variables are determined statically by the compiler in Python. Every name that is assigned to is marked as a local variable at compilation time. – Sven Marnach Mar 28 '18 at 15:45
  • Putting `global ord` at the front of the function avoids the error. – Peter Wood Mar 28 '18 at 15:46
  • 2
    @PeterWood I'm not asking how to fix it, I was asking for an explanation. – Jean-François Fabre Mar 28 '18 at 15:47
  • @Jean-FrançoisFabre I was adding missing information which goes towards part of the explanation. It's a gift horse. – Peter Wood Mar 28 '18 at 15:49
  • thanks anyway. I know you're trying to help. – Jean-François Fabre Mar 28 '18 at 15:51
  • By the way, if your confusion coming from knowing Lisp-style scoping, knowing C-style scoping, or just bare intuition? I tried to explain things in a way that would make sense for all three, but it would be shorter to explain for just one of the three. – abarnert Mar 28 '18 at 16:02
  • errr, it was coming from "thinking I knew python scoping" :) thanks for your excellent answer. I guessed more or less right and knew how to fix the issue. I thought scoping was even more dynamic, where it wasn't. – Jean-François Fabre Mar 28 '18 at 16:03
  • I also thought that 1) I needed a more rational explanation and 2) other people would like to know that as well. – Jean-François Fabre Mar 28 '18 at 16:05
  • Sure. I'm curious whether [the explanation in the reference docs](https://docs.python.org/3/reference/executionmodel.html#naming-and-binding) would be enough to get the concept. It seems like it to me, but then I know my judgment is colored by already knowing it (and think how much worse that must be for, say, Guido…). – abarnert Mar 28 '18 at 16:08
  • something like "If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations.". But not very clear to me. – Jean-François Fabre Mar 28 '18 at 16:10
  • It's not actually a "code block" (unless by that you actually mean a single `code` object). Only functions and classes (and modules) have scopes. That includes lambda functions (which aren't blocks), and comprehensions (which are actually a function definition-plus-call in disguise, and aren't blocks), but doesn't include things like `for` or even `with` statements. Which is another thing that's easy to apply once you learn it, but most people don't learn it until they're badly surprised by it the first time. – abarnert Mar 28 '18 at 16:13
  • "It seems that just referencing a left side operand makes it a local variable, even if the code is not run." Yes, that's precisely it. – Karl Knechtel Sep 11 '22 at 04:53

2 Answers2

14

It's not about the compiler doing a static analysis based on unrelated branches when compiling to bytecode; it's much simpler.

Python has a rule for distinguishing global, closure, and local variables. All variables that are assigned to in the function (including parameters, which are assigned to implicitly), are local variables (unless they have a global or nonlocal statement). This is explained in Binding and Naming and subsequent sections in the reference documentation.

This isn't about keeping the interpreter simple, it's about keeping the rule simple enough that it's usually intuitive to human readers, and can easily be worked out by humans when it isn't intuitive. (That's especially important for cases like this—the behavior can't be intuitive everywhere, so Python keeps the rule simple enough that, once you learn it, cases like this are still obvious. But you definitely do have to learn the rule before that's true. And, of course, most people learn the rule by being surprised by it the first time…)

Even with an optimizer smart enough to completely remove any bytecode related to if False: ord=None, ord must still be a local variable by the rules of the language semantics.

So: there's an ord = in your function, therefore all references to ord are references to a local variable, not any global or nonlocal that happens to have the same name, and therefore your code is an UnboundLocalError.


Many people get by without knowing the actual rule, and instead use an even simpler rule: a variable is

  • Local if it possibly can be, otherwise
  • Enclosing if it possibly can be, otherwise
  • Global if it's in globals, otherwise
  • Builtin if it's in builtins, otherwise
  • an error

While this works for most cases, it can be a bit misleading in some cases—like this one. A language with LEGB scoping done Lisp-style would see that ord isn't in the local namespace, and therefore return the global, but Python doesn't do that. You could say that ord is in the local namespace, but bound to a special "undefined" value, and that's actually close to what happens under the covers, but that's not what the rules of Python say, and, while it may be more intuitive for simple cases, it's harder to reason through.


If you're curious how this works under the covers:

In CPython, the compiler scans your function to find all assignments with an identifier as a target, and stores them in an array. It removes global and nonlocal variables. This arrays ends up as your code object's co_varnames, so let's say your ord is co_varnames[1]. Every use of that variable then gets compiled to a LOAD_FAST 1 or STORE_FAST 1, instead of a LOAD_NAME or STORE_GLOBAL or other operation. That LOAD_FAST 1 just loads the frame's f_locals[1] onto the stack when interpreted. That f_locals starts off as an array of NULL pointers instead of pointers to Python objects, and if a LOAD_FAST loads a NULL pointer, it raises UnboundLocalError.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    The part that makes the rule somewhat unintuitive is that even `ord += 1` renders `ord` a local variable, though this never makes sense unless there is yet another assignment to `ord` in the function. – Sven Marnach Mar 28 '18 at 15:47
  • @SvenMarnach That's exactly why the "simple enough to be easily worked out in cases where it isn't intuitive" part is so important. You do have to learn the rule, but once you do, it's very hard to get it wrong. – abarnert Mar 28 '18 at 15:51
  • For what it's worth, I think this rule is necessary since local variables are accessed by index rather than by a dictionary lookup (which is used for global variables). Otherwise Python could simply do a dynamic lookup – first look in the local scope, then look in the enclosing scopes if the name isn't found. Since local variables are looked up by index rather than by name, the compiler _needs_ to determine statically what variables are local, since it needs to create different byte code for lookup by index than for lookup by name. – Sven Marnach Mar 28 '18 at 15:54
  • @SvenMarnach Python isn't actually required to use "fast locals". In fact, in 2.x, the way the `locals` function is defined, it really _can't_ use "fast locals", and the hacks to make it work anyway (like FastToLocals/LocalsToFast around any `exec`) don't quite work reliably, so CPython 2.x doesn't really implement the Python 2.x reference. That's why 3.x `locals` is explicitly limited in its usefulness. – abarnert Mar 28 '18 at 15:58
  • @SvenMarnach That being said, understanding what CPython does under the covers (and has done since somewhere around 1.something when there was no real reference documentation yet) is definitely helpful, especially to someone coming at this from a Lisp intuition, so I added a section about that. – abarnert Mar 28 '18 at 16:04
  • I'm somewhat confused now by what you say. I know that using `exec` in Python 2 made it fall back to "slow" locals, which isn't the case anymore in Python 3. But how is the `locals()` function different in Python 2 and 3? I looked up the documentation, and it has stayed word for word the same since Python 2.7, and I can't detect any difference in behaviour. – Sven Marnach Mar 28 '18 at 16:38
  • @SvenMarnach The documentation for the function is unchanged since they reworded the warning (I think for 2.6 and 3:0), but I think there’s different documentation somewhere in the reference docs, which I’ll search for once I get to a computer. Meanwhile, the easiest (but completely artificial) way to see the difference interactively is to `ctypes.pythonapi` the frame functions for converting to and from locals dict. It also helps to look at what locals does to closure variables. It’s hard to explain in a comment written on the bus, but hopefully that points you vaguely toward it for now. – abarnert Mar 28 '18 at 17:06
  • Thanks for answering my questions. :) I'll take a look. – Sven Marnach Mar 29 '18 at 18:45
  • I still can't find any difference. `PyFrame_LocalsToFast()` doesn't seem to have changed at all since 2.7: [old](https://github.com/python/cpython/blob/2.7/Objects/frameobject.c#L954-L993) [new](https://github.com/python/cpython/blob/521995205a2cb6b504fe0e39af22a81f785350a3/Objects/frameobject.c#L924-L963). And for `PyFrame_FastToLocals()` (directly above) only the error handling changed. But never mind – I don't want to waste your time any longer, and I already learned a lot from looking at this part of the Python source code again after quite a while. – Sven Marnach Mar 29 '18 at 18:52
2

Just to demonstrate what's going on with the compiler:

def f():
    if False:
        ord = None
    c = ord('a')

  4           0 LOAD_FAST                0 (ord)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 STORE_FAST               1 (c)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE

Access to a is using LOAD_FAST, which is used for local variables.

If you set ord to None outside your function, LOAD_GLOBAL is used instead:

if False:
    ord = None
def f():
    c = ord('a')

  4           0 LOAD_GLOBAL              0 (ord)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
              9 STORE_FAST               0 (c)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • I think it would be worth adding an short explanation of what `LOAD_FAST` vs. `LOAD_GLOBAL` actually does. I tried to cram one into one sentence in my answer, but your answer, there should be room to do it more clearly. (Also, you can probably write more concisely than me...) – abarnert Mar 28 '18 at 16:25