11

I have a question about how closures are implemented.

Say this is in a file named test.lua:

local a = 'asdf'

local function b()
    return a
end

a = 10

return b

And another file does

a = require 'test'
a()

it will print

10

If a is a pointer on the stack to 'asdf' (on the heap I assume, but it doesn't matter), and the closure b is created so presumably the address that was in a is saved for b to use, how does a = 10 change the pointer inside the closure as well?

Wikipedia says quite well what is perplexing me:

A language implementation cannot easily support full closures if its run-time memory model allocates all local variables on a linear stack1. In such languages, a function's local variables are deallocated when the function returns.

I was thinking that perhaps b really didn't save a pointer to 'asdf' but a stack offset to a, so that you can change a and the stack offset will get you to a which points to the last thing you set a to, but then how does this work when a (the pointer) is popped off the stack and the stack offset becomes invalid?

1 I know Lua doesn't allocate the values on the stack, but it allocates local pointers on the stack to values in the heap, doesn't it?

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Seth Carnegie
  • 73,875
  • 22
  • 181
  • 249
  • I changed the title and tags to make this question LUA-specififc. How closures are implemented (and if closures are even supported) is very language-specific. –  Oct 16 '11 at 00:37
  • @pst well lua is implemented in C so I thought it was relevant. – Seth Carnegie Oct 16 '11 at 01:35
  • Thing is, C [only] coders won't know or care -- "it's a LUA thing" ;-) –  Oct 16 '11 at 01:52

1 Answers1

26

I really wish you had named these variables a bit more reasonably. So I will:

local inner = 'asdf'

local function b()
    return inner
end

inner = 10

return b

and

func = require 'test'
func()

OK, now that we know what we're talking about, I can proceed.

The Lua chunk test has a local variable called inner. Within that chunk you create a new function b. Since this is a new function, it has a scope within the scope of the chunk test.

Since it is within a function, it has the right to access local variables declared outside of that function. But because it is inside of a function, it does not access those variables like it would one of its own locals. The compiler detects that inner is a local variable declared outside of the function's scope, so it converts it into what Lua calls an "upvalue".

Functions in Lua can have an arbitrary number of values (up to 255) associated with them, called "upvalues". Functions created in C/C++ can store some number of upvalues by using lua_pushcclosure. Functions created by the Lua compiler use upvalues to provide lexical scoping.

A scope is everything that happens within a fixed block of Lua code. So:

if(...) then
  --yes
else
  --no
end

The yes block has a scope, and the no block has a different scope. Any local variables declared in the yes block cannot be accessed from the no block, because they are outside of the scope of the no block.

The Lua constructs that define a scope are if/then/else/end, while/do/end, repeat/until, do/end, for/end, and function/end. Also, each script, called a Lua "chunk", has a scope.

Scopes are nested. From within one scope, you can access local variables declared in a higher scope.

A "stack" represents all variables declared as local within a particular scope. So if you have no local variables in a certain scope, the stack for that scope is empty.

In C and C++, the "stack" that you are familiar with is just a pointer. When you call a function, the compiler has predetermined how many bytes of space that the function's stack needs. It advances the pointer by that amount. All stack variables used in the function are just byte offsets from the stack pointer. When the function exits, the stack pointer is decreased by the stack amount.

In Lua, things are different. The stack for a particular scope is an object, not merely a pointer. For any particular scope, there are some number of local variables defined for it. When the Lua interpreter enters a scope, it "allocates" a stack of the size necessary to access those local variables. All references to local variables are just offsets into that stack. Access to local variables from higher scopes (previously defined) simply access a different stack object.

So in Lua, you conceptually have a stack of stacks (which I will refer to as the "s-stack" for clarity). Each scope creates a new stack and pushes it, and when you leave a scope, it pops the stack off of the s-stack.

When the Lua compiler encounters a reference to a local variable, it converts that reference into an index into the s-stack, and an offset into that particular stack. So if it accesses a variable in the current local stack, the index into the s-stack refers to the top of the s-stack, and the offset is the offset into that stack where the variable is.

That's fine for most Lua constructs that access scopes. But function/end don't just create a new scope; they create a new function. And this function is allowed to access stacks that aren't just the local stack of that function.

Stacks are objects. And in Lua, objects are subject to garbage collection. When the interpreter enters a scope, it allocates a stack object and pushes it. So long as the stack object is pushed onto the s-stack, it cannot be destroyed. The stack of stacks refers to the object. However, once the interpreter exits the scope, it pops the stack off of the s-stack. So since it is no longer referenced, it is subject to being collected.

However, a function that accesses variables outside of its own local scope can still be referencing that stack. When the Lua compiler sees a reference to a local variable that is not within the function's local scope, it alters the function. It figures out which stack the local it is referencing belongs to, and then stores that stack as an upvalue in the function. It converts the reference to that variable to an offset into that particular upvalue, rather than an offset into a stack that is currently on the s-stack.

So as long as the function object continues to exist, so too will the stack(s) that it references.

Remember that stacks are dynamically created and destroyed as the Lua interpreter enters and exits the scope of functions. So if you were to run test twice, by calling loadfile and executing the returned function twice, you would get two separate functions that refer to two separate stacks. Neither function will see the value from the other.

Note that this may not be exactly how it's implemented, but that's the general idea behind it.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • Aha, I see. So then if each function stores its own scope and the compiler converts names of local variables to offsets in that scope, there are never ever any local variables on the stack, because they simply aren't stored there, correct? (Not counting pushing them on for function calls, etc) – Seth Carnegie Oct 16 '11 at 00:15
  • And also, many scopes can be associated with one function, correct? It's own, and the scopes to which the upvalues it references belongs. – Seth Carnegie Oct 16 '11 at 00:17
  • @SethCarnegie: `local` variables are variables that are local to a certain scope. That scope **is** the stack; the stack is an object. When a scope is exited, the reference to that stack is lost. If nothing else owns that stack's reference (if no function that still exists references the stack), then the stack can be collected. Closures store a piece of the stack. One would expect that a smart compiler can tell when a function will store a piece of the stack and when it won't, so it can optimize away the simple cases. – Nicol Bolas Oct 16 '11 at 00:20
  • @SethCarnegie: A function's internal stack (the one that it gets its own `local` variables from) works like normal. It comes into existence when the function is called, and it goes away when the function exits. The external stacks it references are a part of the function object itself, but the function's execution stack is rebuilt at each function call. – Nicol Bolas Oct 16 '11 at 00:22
  • I meant the global stack. I didn't know each function had its own stack. Or rather, I didn't know the scope was called a stack, since it seems like it would be a fixed size since you know exactly how many local variables you'll have at "compile" time. So, there's a global stack for the whole program, and a mini-stack (scope thing) for each function when it's called, correct? Thanks for bearing with me. – Seth Carnegie Oct 16 '11 at 00:23
  • @SethCarnegie: Each scope (every block) has a stack. This represents all of the local variables used within that block. The size is fixed at compile-time, but it only gets its memory when that scope is entered. In C or C++, the stack is just a pointer value that is incremented. In Lua, the stack for a particular scope can actually be a full-fledged object, which lives beyond the lifetime of that scope. – Nicol Bolas Oct 16 '11 at 00:27
  • Sorry to bring this up again, but do you know of other languages (or even lua) do it this way, and if there are any (other) popular ways of doing it? – Seth Carnegie Oct 19 '11 at 21:12
  • @NicolBolas: is what you term a stack called a "[stack] frame" in other contexts, or you you really mean each block gets its own stack? – outis Sep 22 '13 at 02:03