The idea seems clear to me - let's if I am able to write it out:
in Python, variables are not themselves in a fixed memory position, referencing whatever is placed in that memory position, as happen with static languages. Neither do they directly "contain a reference (pointer) to an object in memory". What happens is that they contain this reference indirectly, and how this indirection occurs depend on the variable kind. (read to the end)
First, let's understand that without the "intermediate" cell object, an assignment to the x
variable in the inner scope would simply change the object x
is pointing too - and this value alteration would never be known to the x
in the outer scope: it would simply keep referencing the previous object.
What takes place for nonlocal variables is that the compiler generated bytecode will read and write values to the cell object, so that the value is always in sync in all variables that share that same variable. But for the code using those values, in the Python source, it is completely transparent: the bytecode will always be built to load and set the value to the cell - whereas for a plain, non-used in an inner scope variable, different bytecode is emitted, which will save the vale of a given variable in a different area (fast-locals, or globals).
So, still in other words: there are 3 fundamentally different variable types in Python, and the compiler knows(*), at compile time, which kind a given variable is - and it will emit different bytecode for each kind. Nonlocal variables get bytecode which uses the cell
object (as if it were a memory position in a static language) to store the value for themselves. ((*) - If there is ambiguity at compile time towards the type of a variable, the compiler will simply error out. Although there are legacy bytecode instructions which will search a variable by name in all valid scopes, I don't think code in Python 3.11 generate those anymore.)
Illustrating the example from the second paragraph above - check this code:
def a():
x = 1
def c():
nonlocal x
x = 2
c()
print(x)
If the x
s would simply "point to a value", when it is changed to 2
in the inner function c
, would have no way to know the value had changed.
Look at the final part of the disassembled code for a
:
7 38 LOAD_GLOBAL 1 (NULL + print)
50 LOAD_DEREF 1 (x)
52 PRECALL 1
56 CALL 1
And compare it for the same code when the inner function does not use a variable from a
:
In [4]:
...: def a():
...: x = 1
...: def c():
...: y = 2
...: c()
...: print(x)
...
7 32 LOAD_GLOBAL 1 (NULL + print)
44 LOAD_FAST 0 (y)
46 PRECALL 1
50 CALL
As I stated above, Python uses a different opcode strategy: LOAD_DEREF
vs LOAD_FAST
in both cases: the "DEREF" opcodes are the ones that will store a value to and retrieve them from a Cell object. The fact that we can inspect and have access to Cell objects from Python code is actually almost a "luxury" - and the language would work just the same if it were completely opaque.
As for your preoccupations about reference counting, we get to the final part of this: the "x" name, as you can see, exists only in the source code. In the first case above, there is only one reference to the value that will be retrieved whenever we make use of the x
variable: the object that is referenced by the cell. This can make one's head spin around, actually: Python variables do not "exist" when the programming is in execution - they "exist" only at the points where a value is stored or retrieved from them - and at these points, the compiler emits the appropriate bytecode to store or retrieve that value from the appropriate container. For global variables, that container is the modules globals()
dictionary (with a fallback to the builtins
module). For local variables it will usually be a slot in the current Frame, used by the LOAD_FAST
and STORE_FAST
(the containts of theses slots are mirrored to a locals()
dictionary whenever needed), and for non-local variables, or variables that are shared with inner scopes, the container is a Cell object.
The final result of all this if that a variable "x" used in a closure like above will behave like it is the same variable, and count just as one reference to its value, even if there are several inner functions using the same cell. While, if at any point we create a second variable, and do y = x, whatever is the type of
y(local to the outer or any of the inner functions, nonlocal or global) a second reference to the value of
xis created and placed in the appropriate container for
y`. It works as if by magic, but it is just some logic design.