Since your question wasn't very specific, it can only be answered with a general algorithm. I also make no claims that this is the "most sensible" way, simply one that works.
First, we need to define the problem space.
Your scripting language has local variables (using the Lua term): variables which are not global. These are variables which can theoretically be captured by a lambda.
Now, let's assume your scripting language does not have a way to dynamically select local variables. This means that, simply by inspecting the syntax tree, one can see the following:
- Which local variables are captured by a function.
- Which local variables are not captured by a function.
- Which functions capture local variables outside of their scope.
- Which functions do not capture local variables outside of their scope.
Given this information, local variables now are split into two groups: pure local variables, and captured local variables. I will call these "pure locals" and "captured locals".
Pure locals are, for want of a better term, registers. When you compile to your byte code, pure locals are the simplest to handle. They're specific stack indices, or they're specific register names. However you're doing your stack management, pure locals are assigned fixed locations within a particular scope. If you're wielding the power of JIT, then these are going to become registers, or the closest thing to that possible.
The very first thing you need to understand about captured locals is this: they must be managed by your memory manager. They exist independently of the current call stack and scope, and therefore, they need to be free-standing objects which are referenced by functions that capture them. This allows multiple functions to capture the same local and therefore reference each others private data.
Therefore, when you enter a scope that contains captured lambdas, you will allocate a piece of memory that contains all of the captured locals that are part of that particular scope. For example:
comp(threshold)
{
local data;
return lambda(x)
{
return x < (threshold + data);
};
}
The root scope of the comp
function has two local variables. Both of them are captured. Therefore, the number of captured locals is 2 and the number of pure locals is zero.
Your compiler (to byte code) will therefore allocate 0 registers/stack variables for pure locals, and it will allocate a free-standing object which contains two variables. Assuming you're using garbage collection, you will need something to reference it in order for it to continue to live. That's easy: you reference it in a register/stack location, one that is not directly accessible by the script. So really, you do allocate a register/stack variable, but the script can't directly touch it.
Now, let's look at what lambda
does. It creates a function. Again, we know that this function captures some variables outside of its scope. And we know which variables it captures. We see that it captures two variables, but we also see that these two variables come from the same free-standing block of memory.
So what lambda
does is create a function object that has a reference to some bytecode and a reference to the variables it is associated with. The bytecode will use that reference to get its captured variables. Your compiler knows which variables are pure local to the function (like the argument x
) and which are externally captured locals (like threshold). So it can figure out how to access each variable.
Now, when lambda
completes, it returns a function object. At this point, the captured variables are referenced by two things: the lambda function and the stack: the current scope of the function. When return
finishes however, the current scope is destroyed and all the things that were previously referenced are no longer referenced. So when it returns the function object, only the lambda function has the reference to the captured variables.
This is all rather complicated though. A much simpler implementation would be to just make all local variables effectively captured; all local variables are captured locals. If you do this, then your compiler can be a lot simpler (and probably quicker). When a new scope is entered, all of the locals for that scope are allocated in a block of memory. When a function is created, it references all of the external scopes it uses (if any). And when a scope is exited, it removes a reference to the locals it allocated. If nobody else is referencing it, then the memory can be freed.
It's very simple and straightforward.