From an intuitive point of view, the answer is pretty simple:
Free variables in a function definition capture variables in the enclosing scope. But class attributes aren't variables, they're class attributes; you have to access them as Alice.spam
or self.spam
, not as spam
. Therefore, spam
doesn't capture the outer spam
because there is no outer spam
.
But under the covers, this isn't really true.
For a new-style class, while the class definition's body is being executed, spam
actually is a local variable in the scope of that body; it's only when the metaclass (type
, in this case) is executed that the class attributes are created from those locals.[1]
For an old-style class, it's not completely defined what happens, so you pretty much have to turn to the implementation. In particular, there's no step where the metaclass is executed with the class definition's locals to generate the class object. But for the most part, it works pretty much as if that were the case.
So, why doesn't spam
bind to that local?
A free variable can only bind to a closure cell from an outer scope, which is a special kind of local variable. And the compiler only creates a closure cell for a variable in a function definition when a local function accesses it. It doesn't create closure cells for variables in class definitions.
So if spam
doesn't bind to Alice.spam
, what does it bind to? Well, by the usual LEGB rules, if there's no local assignment, and no enclosing cell variable, it's a global.
Some of the above may be hard to understand without an example, so:
>>> def f():
... a=1
... b=2
... def g():
... b
... return g
>>> f.__code__.co_cellvars # cell locals, captured by closures
('b',)
>>> f.__code__.co_varnames # normal locals
('a', 'g')
>>> g = f()
>>> g.__code__.co_freevars # free variables that captured cells
('b',)
>>> class Alice:
... a=1
... b=2
... def f():
... b
>>> Alice.f.__func__.__code__.co_freevars
()
>>> Alice.f.__func__.__code__.co_varnames
()
>>> Alice.f.__func__.__code__.co_names # loosely, globals
('b',)
If you're wondering where co_cellvars
and the like are specified… well, they're not, but the inspect
module docs give a brief summary of what they mean.
If you understand CPython bytecode, it's also worth calling dis
on all of these chunks of code to see the instructions used for loading and saving all these variables.
So, the big question is, why doesn't Python generate cells for class definitions?
Unless Guido remembers, and finds it interesting enough to write a Python history blog post about this, I'm not sure we'll ever know the answer. (You could, of course, try asking him—a comment on his blog or an email to whichever mailing list seems most relevant is probably the best way.)
But here's my guess:
Cells are implemented as indices into an array stored in a code object. When the function is called, its frame gets an matching array of objects. When a local function definition is executed inside that function call, the free variables are bound to references to the cell slots in the frame.
Classes don't have __code__
members (or, pre-2.6, co_code
). Why? Because a class definition is executed as soon as it's defined, and never executed again, so why bother? This means there's nowhere to stash a cell, and nothing for it to reference. On top of that, the execution frame always goes away as soon as the execution finishes, because there can't be any external references to it.
Of course you could change that: add __code__
members to classes, create cells in them, and then, if someone closed over those cells, that would keep the frame alive after execution just as it does with functions. Would that be a good idea? I don't know. My guess is that nobody asked the question when Python classes were first being defined. While it's obvious now how much class definitions and function definitions have in common, I think that's an instance of Guido's time machine—he made a design decision without realizing that it would turn out to solve problems nobody would raise until a decade later.
[1] Some of these details may be CPython-specific. For example, I believe it's technically legal for an implementation to make a closure cell for every local in a function, or to use some other mechanism that's equivalent to that. For example, if you do exec('spam=3')
in an inner function, all the language reference says is that it's not guaranteed that it will affect the outer function's spam
, not that it's guaranteed not to do so.