Implementing lambdas in a scripting language

Question

I have this modified code example I took from the wikipedia article on anonymous functions.

comp(threshold)
{
    return lambda(x)
    {
        return x < threshold;
    };
}

main()
{
    a = comp(10);

    lib::print( a(5) );
}

Anonymous functions shouldn't be too hard to add to my scripting language. It should just be a case of adding the function code in the normal fashion, with the exception that the only way to access that function is through the variable to which it is assigned.

The problem with the closure example above is that the function body for the anonymous function references (in a general sense) a memory location that is invalid (or would be) at the time the closure is called.

I have two potential solutions to the problem in my mind already; I just want to get some recommendations first, before I attempt to add this feature to my language.

This isn't really gamedev related and should be moved to stackoverflow. — bummzack, Dec 20 '11 at 17:36
You have to capture the context somehow. C++ has two options - you can capture by reference (in which case the lambda secretly keep a pointer/reference in the instance, and can't usefully outlive the scope) or by value (in which case the lambda object needs to secretly keep a copy in the instance). Your language might allow for a third option though -- whatever you use as stack, if the lambda object can retain a reference to the stack frame, and if stack frames themselves are garbage-collected (or similar), then the existence of the lambda could prolong the existence of that local variable. — Steve Jessop, Dec 20 '11 at 19:54
Another option is to perform closure elimination through CPS transform. This is what the Scheme implementations do (and it is not hard to understand, read the Lambda the Ultimate XXX papers from Guy Lewis Steele, the creator of Scheme). You will need garbage collection for this, since it amounts to (a sane version of) @SteveJessop 's third option. — Alexandre C., Dec 20 '11 at 20:21
There are a number of ways to do it. Can you describe your scripting language in more detail? For example, does it support dynamic dispatch? Would you want closed-over local variables to be lexically scoped (like in C#) or dynamically scoped (like in PostScript)? Are functions already "first class"? How much static type analysis does your language perform? Can you represent things like classes/records/structs/associative arrays/etc, and if so, how? — Eric Lippert, Dec 20 '11 at 21:28
See [Implementation of closures in Lua](http://stackoverflow.com/questions/7781432/implementation-of-closures-in-lua), it had a really good answer by Nicol Bolas — Seth Carnegie, Dec 20 '11 at 23:15

lhf · Answer 1 · 2011-12-21T00:18:07.323

2

I don't know about the most sensible way but you can read about how Lua implements closures in The implementation of Lua 5.0. See also the slides.

The main point in the Lua implementation of closures is an efficient handling of upvalues or external local variables. This has allowed Lua to support full lexical scoping. For an account of how this support evolved in the design of Lua, see the HOPL III paper, The Evolution of Lua.

edited Dec 21 '11 at 00:18

answered Dec 20 '11 at 21:35

lhf

70,581
9
108
149

Just read that and learned about the "upvalue"...Intriguing. If you want an accept, you should really mention this in your answer. – Truncheon Dec 20 '11 at 23:20
@Truncheon, I've just added a little more detail. Thanks for the suggestion. – lhf Dec 21 '11 at 00:18

score 1 · Answer 2 · answered Dec 20 '11 at 23:24

If I were to translate this into C++ (without lambdas) it would look like this:

struct comp_lamda_1 {
    int threshold;
    comp_lamda_1(int t) :threshold(t) {}
    bool operator()(int x) {
        return x < threshold;
    };
};

comp_lambda_1 comp(int threshold)
{
    return comp_lamda_1(threshold);
}

int main()
{
    auto a = comp(10);
    std::cout << a(5);
}

This shows that the interpreter shouldn't treat an anonymous function as a freestanding function, but instead as a function-object, that has members to capture the needed variables.

(To be clear, the point is that comp_lamda_1 is a function-object, and I understand you weren't asking for a C++ translation of the above code)

Nicol Bolas · Accepted Answer · 2011-12-20T23:40:04.573

Since your question wasn't very specific, it can only be answered with a general algorithm. I also make no claims that this is the "most sensible" way, simply one that works.

First, we need to define the problem space.

Your scripting language has local variables (using the Lua term): variables which are not global. These are variables which can theoretically be captured by a lambda.

Now, let's assume your scripting language does not have a way to dynamically select local variables. This means that, simply by inspecting the syntax tree, one can see the following:

Which local variables are captured by a function.
Which local variables are not captured by a function.
Which functions capture local variables outside of their scope.
Which functions do not capture local variables outside of their scope.

Given this information, local variables now are split into two groups: pure local variables, and captured local variables. I will call these "pure locals" and "captured locals".

Pure locals are, for want of a better term, registers. When you compile to your byte code, pure locals are the simplest to handle. They're specific stack indices, or they're specific register names. However you're doing your stack management, pure locals are assigned fixed locations within a particular scope. If you're wielding the power of JIT, then these are going to become registers, or the closest thing to that possible.

The very first thing you need to understand about captured locals is this: they must be managed by your memory manager. They exist independently of the current call stack and scope, and therefore, they need to be free-standing objects which are referenced by functions that capture them. This allows multiple functions to capture the same local and therefore reference each others private data.

Therefore, when you enter a scope that contains captured lambdas, you will allocate a piece of memory that contains all of the captured locals that are part of that particular scope. For example:

comp(threshold)
{
    local data;
    return lambda(x)
    {
        return x < (threshold + data);
    };
}

The root scope of the comp function has two local variables. Both of them are captured. Therefore, the number of captured locals is 2 and the number of pure locals is zero.

Your compiler (to byte code) will therefore allocate 0 registers/stack variables for pure locals, and it will allocate a free-standing object which contains two variables. Assuming you're using garbage collection, you will need something to reference it in order for it to continue to live. That's easy: you reference it in a register/stack location, one that is not directly accessible by the script. So really, you do allocate a register/stack variable, but the script can't directly touch it.

Now, let's look at what lambda does. It creates a function. Again, we know that this function captures some variables outside of its scope. And we know which variables it captures. We see that it captures two variables, but we also see that these two variables come from the same free-standing block of memory.

So what lambda does is create a function object that has a reference to some bytecode and a reference to the variables it is associated with. The bytecode will use that reference to get its captured variables. Your compiler knows which variables are pure local to the function (like the argument x) and which are externally captured locals (like threshold). So it can figure out how to access each variable.

Now, when lambda completes, it returns a function object. At this point, the captured variables are referenced by two things: the lambda function and the stack: the current scope of the function. When return finishes however, the current scope is destroyed and all the things that were previously referenced are no longer referenced. So when it returns the function object, only the lambda function has the reference to the captured variables.

This is all rather complicated though. A much simpler implementation would be to just make all local variables effectively captured; all local variables are captured locals. If you do this, then your compiler can be a lot simpler (and probably quicker). When a new scope is entered, all of the locals for that scope are allocated in a block of memory. When a function is created, it references all of the external scopes it uses (if any). And when a scope is exited, it removes a reference to the locals it allocated. If nobody else is referencing it, then the memory can be freed.

It's very simple and straightforward.

score 0 · Answer 4 · answered Dec 22 '11 at 01:08

I've been reading about the upvalues used in Lua. I'm going to try to implement a similar system for dealing with closures and full lexical scoping. The tricky part will be getting the compiler to place the close commands in the correct places, as needed.

function()
{
    a = 6, b;

    {
        local c = 5;

        b = lambda() { return a*c; };

        // close c in closure;
    }

    b();

    // close a in closure
}

Implementing lambdas in a scripting language

4 Answers4