4

I have below initial C++ code:

class Lambda
{
public:
    int compute(int &value){
        auto get = [&value]() -> int {
            return 11 * value;
        };
        return get();
    }
};

int main(){
    Lambda lambda;
    int value = 77;
    return lambda.compute(value);
}

which compiled (using -O1) with clang generates below ASM:

main: # @main
  push rax
  mov dword ptr [rsp + 4], 77
  mov rdi, rsp
  lea rsi, [rsp + 4]
  call Lambda::compute(int&)
  pop rcx
  ret
Lambda::compute(int&): # @Lambda::compute(int&)
  push rax
  mov qword ptr [rsp], rsi
  mov rdi, rsp
  call Lambda::compute(int&)::{lambda()#1}::operator()() const
  pop rcx
  ret
Lambda::compute(int&)::{lambda()#1}::operator()() const: # @Lambda::compute(int&)::{lambda()#1}::operator()() const
  mov rax, qword ptr [rdi]
  mov eax, dword ptr [rax]
  lea ecx, [rax + 4*rax]
  lea eax, [rax + 2*rcx]
  ret

Questions:

  1. What is the {lambda()#1} which appears in the ASM? To my knowledge it might be an closure which encapsulates the function object (i.e. lambda body). Please confirm if so.
  2. Is a new closure generated every time compute() is triggered? Or is the same instance?
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Ionut B
  • 61
  • 4
  • The lambda doesn't escape the function, so it's not a callback or anything. It's just a regular function that gets called with 2 levels of indirection. (Pointer to a pointer to `value` , hence the 2 loads.) – Peter Cordes Mar 15 '18 at 07:54
  • 1
    Note that `Lambda::compute(int&)::{lambda()#1}::operator()()` is a demangled name; you're probably using http://gcc.godbolt.org/ rather than looking at unfiltered `clang -O1 -S` output. There's no magic here, it's just a regular `call` instruction. – Peter Cordes Mar 15 '18 at 07:55
  • i dont understand 2. Everytime you call `compute()` you create a new lambda and call it, why should the ASM do something different? – 463035818_is_not_an_ai Mar 15 '18 at 07:55
  • If you passed the lambda closure object itself to a non-inline function, you might end up with code something like what gcc does for GNU C nested functions (which access variables in the containing function). https://stackoverflow.com/questions/8179521/implementation-of-nested-functions. But hopefully you get something more efficient than that nasty executable-stack shenanigans with code that writes instructions bytes for a trampoline onto the stack. – Peter Cordes Mar 15 '18 at 08:31
  • 3
    @PeterCordes I don't think you'd ever get that craziness. Lambdas are anonymous objects and their `operator()` inherently have a hidden `this` pointer passed as context. The crazy executable stack is a hack to not pass a context pointer, required since the nested function, when used as a pointer, can't ever pass that pointer – Passer By Mar 15 '18 at 08:43
  • @PasserBy: ah right, because when you pass a pointer to it, it has to work like a regular function pointer. I'd forgotten exactly why the trampoline was needed. But everything that uses a C++ lamba knows it's a lamba, and knows to pass the context to the function pointer. So yes, interesting difference between nested functions and lambdas. – Peter Cordes Mar 15 '18 at 08:46
  • @PeterCordes - exactly. Lamda's either close over nothing, in which case they _can_ decay to a function pointer, or they close over something, and hence are stateful, and then need the closure state is carried around by an unspecified object type (or converted to a `std::function` which itself will store the stoate), so you never have the case of a "stateful" function pointer. Nested functions on the other hand allow exactly that, so in that sense their function pointer embeds both the state and the implementation, and so you need the "runtime code generation" aspect. – BeeOnRope Mar 15 '18 at 17:47

2 Answers2

2
  1. It's the body (implementation) of the lambda function you declared in compute.
  2. Yes, every time you call compute, conceptually and in practice (at this optimization level1) a new closure is created on the stack, and the associated lambda function is called with a pointer to that closure (passed as rdi, i.e. as the first argument in the same way as the this pointed for a member function).

1 The "at this optimization level" part is very important. There is nothing here that actually requires the compiler to generate the closure at all or a separate lambda function. At -O2 for example, clang optimizes all this away and just returns the answer as a constant directly in main(). gcc does the same optimization even at -O1.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2
  1. Yes, calling a lambda-function will require a closure to be generated [unless the compiler can reason that it's not actually being used]

  2. Each call, with this optimisation, will be a call to compute, which in turn calls the internal function get(), which is a lambda function inside your compute function. Letting the compiler optimise to a higher degree, for THIS case, will optimise the call away - in my attempt, it will completely remove the entire call with -O2, and just return the pre-calculated constant of 847 - as you'd expect. For a more complex case, it may or may not inline the lambda part but keep the outer call, or vice versa. It very much depends on the exact details of what is going on inside the functions involved.

    Just to be clear, the compiler is doing exactly what you asked for: calling the function compute, which in turn calls the function get.

Adding

    int value2 = 88;
    int tmp = lambda.compute(value2);

into the main function in the original question produces essentially this change to the generated code (using clang++ on Linux):

main:                                   # @main
    pushq   %rbx
    subq    $16, %rsp
    movl    $77, 12(%rsp)
    ## new line to set value2
    movl    $88, 8(%rsp)
    movq    %rsp, %rbx
    ## New line, passing reference of `value2` to lambda.compute
    leaq    8(%rsp), %rsi
    movq    %rbx, %rdi
    ## Call lambda.compute
    callq   _ZN6Lambda7computeERi
    ## Same as before.
    leaq    12(%rsp), %rsi
    movq    %rbx, %rdi
    callq   _ZN6Lambda7computeERi
    addq    $16, %rsp
    popq    %rbx
    retq

The code generated to

Community
  • 1
  • 1
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227