Why does python use intermediate cell for closures?

Question

guys i quite don't understand why python uses intermediate cell for closures. For example:

def outer():
   x = "world"
   def inner():
      print(f"Hello {x}")
   return inner

Now both x.outer and x.inner (disregard dot notation, that's just to differentiate between two variables) are pointing to the same intermediate cell which in turn point to the memory cell that contain our string object.

What are we getting from this intermediate cell? Why can't those two variables point directly to the memory cell that contains our string object?

From reference counting perspective even after outer() function finished running we still have reference count of 1 (since we still have x.inner variable) so python memory manager won't be able to empty this intermediate cell. But we would have the same reference count if those two variables were pointing directly to the memory cell containing string object. So I guess it has nothing to do with reference counting.

So what is idea behind this intermediate cell and why we need to use it? Thank you

Your question is really no so clear. Please improve. I mean "inner" just evaluates to a new string, right? — Ralf Ulrich, Jan 23 '23 at 09:12
I haven't seen the term "intermediate cell" in Python. What does that mean? — DarrylG, Jan 23 '23 at 13:41
@DarrylG If we run above code we will get following: fn = outer() print(fn.__closure__) --> (,). As you can see now our free variable points to the intermediate cell that is stored at 0x10d1edfd0 which in-turn point to the memory cell that contains our string object. My question is, why doesn't python directly point to the memory cell but uses this intermediate cell? — andrew mamchyn, Jan 23 '23 at 15:53
As [Python closures](https://zetcode.com/python/python-closures/) shows, providing it the way Python currently does it allows the functionality of small classes. The "intermediate cell" is akin to a variable in an object instance. — DarrylG, Jan 23 '23 at 17:02
I suggest you add the info from your comment to the question. Then the comment of DarryIG would also qualify as a proper answer, right? — Ralf Ulrich, Jan 24 '23 at 08:52

jsbueno · Answer 1 · 2023-06-11T03:29:00.823

The idea seems clear to me - let's if I am able to write it out: in Python, variables are not themselves in a fixed memory position, referencing whatever is placed in that memory position, as happen with static languages. Neither do they directly "contain a reference (pointer) to an object in memory". What happens is that they contain this reference indirectly, and how this indirection occurs depend on the variable kind. (read to the end)

First, let's understand that without the "intermediate" cell object, an assignment to the x variable in the inner scope would simply change the object x is pointing too - and this value alteration would never be known to the x in the outer scope: it would simply keep referencing the previous object.

What takes place for nonlocal variables is that the compiler generated bytecode will read and write values to the cell object, so that the value is always in sync in all variables that share that same variable. But for the code using those values, in the Python source, it is completely transparent: the bytecode will always be built to load and set the value to the cell - whereas for a plain, non-used in an inner scope variable, different bytecode is emitted, which will save the vale of a given variable in a different area (fast-locals, or globals).

So, still in other words: there are 3 fundamentally different variable types in Python, and the compiler knows(*), at compile time, which kind a given variable is - and it will emit different bytecode for each kind. Nonlocal variables get bytecode which uses the cell object (as if it were a memory position in a static language) to store the value for themselves. ((*) - If there is ambiguity at compile time towards the type of a variable, the compiler will simply error out. Although there are legacy bytecode instructions which will search a variable by name in all valid scopes, I don't think code in Python 3.11 generate those anymore.)

Illustrating the example from the second paragraph above - check this code:


def a():
   x = 1
   def c():
      nonlocal x
      x = 2
   c()
   print(x)

If the xs would simply "point to a value", when it is changed to 2 in the inner function c, would have no way to know the value had changed.

Look at the final part of the disassembled code for a:

  7          38 LOAD_GLOBAL              1 (NULL + print)
             50 LOAD_DEREF               1 (x)
             52 PRECALL                  1
             56 CALL                     1

And compare it for the same code when the inner function does not use a variable from a:

In [4]: 
   ...: def a():
   ...:    x = 1
   ...:    def c():
   ...:       y = 2
   ...:    c()
   ...:    print(x)
...
  7          32 LOAD_GLOBAL              1 (NULL + print)
             44 LOAD_FAST                0 (y)
             46 PRECALL                  1
             50 CALL

As I stated above, Python uses a different opcode strategy: LOAD_DEREF vs LOAD_FAST in both cases: the "DEREF" opcodes are the ones that will store a value to and retrieve them from a Cell object. The fact that we can inspect and have access to Cell objects from Python code is actually almost a "luxury" - and the language would work just the same if it were completely opaque.

As for your preoccupations about reference counting, we get to the final part of this: the "x" name, as you can see, exists only in the source code. In the first case above, there is only one reference to the value that will be retrieved whenever we make use of the x variable: the object that is referenced by the cell. This can make one's head spin around, actually: Python variables do not "exist" when the programming is in execution - they "exist" only at the points where a value is stored or retrieved from them - and at these points, the compiler emits the appropriate bytecode to store or retrieve that value from the appropriate container. For global variables, that container is the modules globals() dictionary (with a fallback to the builtins module). For local variables it will usually be a slot in the current Frame, used by the LOAD_FAST and STORE_FAST (the containts of theses slots are mirrored to a locals() dictionary whenever needed), and for non-local variables, or variables that are shared with inner scopes, the container is a Cell object.

The final result of all this if that a variable "x" used in a closure like above will behave like it is the same variable, and count just as one reference to its value, even if there are several inner functions using the same cell. While, if at any point we create a second variable, and do y = x, whatever is the type of y(local to the outer or any of the inner functions, nonlocal or global) a second reference to the value ofxis created and placed in the appropriate container fory`. It works as if by magic, but it is just some logic design.

Why does python use intermediate cell for closures?

1 Answers1