5

Compile with g++.exe -m64 -std=c++17 and any optimization level, and run:

#include <iostream>

int main() {
    const auto L1 = [&](){};
    std::cout << sizeof(L1) << std::endl;
    const auto L2 = [&](){L1;};
    std::cout << sizeof(L2) << std::endl;
    const auto L3 = [&](){L1, L2;};
    std::cout << sizeof(L3) << std::endl;
    const auto L4 = [&](){L1, L2, L3;};
    std::cout << sizeof(L4) << std::endl;
}

The output is 1,8,16,24, which means that L2 contains 1 reference, L3 contains 2 and L4 contains 3.

However, given the same function "[&](){L1, L2;} in main()", the value of &L1 - &L2 should be fixed, and to use L1 with a pointer to L2, there's direct addressing in x86 [rbx+const] assuming rbx=&L2. Why does GCC still choose to include every reference in the lambda?

Boann
  • 48,794
  • 16
  • 117
  • 146
l4m2
  • 1,157
  • 5
  • 17
  • 1
    You compiled with optimization disabled (the default is `-O0`), so wondering why G++ didn't make smarter choices is a bit silly. Although even with optimization enabled, we still get the same sizes. https://godbolt.org/z/a46Nq4 – Peter Cordes Jun 07 '19 at 11:22
  • @PeterCordes Edited to show optimizing level doesn't matter and for 64-bit – l4m2 Jun 07 '19 at 11:25
  • 2
    You could report this as a missed-optimization GCC bug (https://gcc.gnu.org/bugzilla/) and see if it would be plausible for gcc to capture a range of locals instead of storing pointers to each one individually. – Peter Cordes Jun 07 '19 at 11:32
  • 1
    Maybe ABI concerns are relevant here as well. – Vittorio Romeo Jun 07 '19 at 11:39
  • I would suggest adding the `language-lawyer` tag – Mike Lui Jun 07 '19 at 12:57
  • 2
    @MikeLui: Why? I'm pretty sure the ISO C++ isn't going to have anything to say about implementation details like this. It's not a question about language rules. If there are relevant parts of the C++ standard that constrain the implementation choices, the C++ tag covers. – Peter Cordes Jun 07 '19 at 12:59
  • @PeterCordes Because I wasn’t sure the standard wouldn’t have some buried text requiring a standard layout of lambdas :). – Mike Lui Jun 07 '19 at 13:05
  • @MikeLui: I'm pretty sure they don't; you can't even pass them to functions other than template functions, so as I argued in my answer I think the implementation has total freedom over how it gets the captures to the place where the lamba body expands into machine code. – Peter Cordes Jun 07 '19 at 13:12
  • I don't think the optimization helps as much as you think it does: in the same scenario the compiler knows the relative locations on the stack, in order to optimize the capture to only capture a single pointer, it also sees into the lambda body and is able to inline and optimize the calls. So you can expose the implementation with `sizeof()`, sure, but when does this lead to worse code generation? – BeeOnRope Jun 08 '19 at 02:23

1 Answers1

1

I think this is a missed optimization, so you could report it as a gcc bug on https://gcc.gnu.org/bugzilla/. Use the missed-optimization keyword.

A capturing lambda isn't a function on it own, and can't decay/convert to a function pointer, so I don't think there's any required layout for the lambda object. (Use a lambda as a parameter for a C++ function). The generated code that reads the lambda object will always be generated from the same compilation unit that defined it. So it sounds plausible that it just needs one base pointer for all locals, with offsets from that.

Other captures of variables with storage class other than automatic might still need separate pointers, if their offsets from each other weren't compile-time or at least link-time constants. (Or that could be a separate optimization.)


You can actually get the compiler to use the space and create a lambda object in memory by passing the lambda to a __attribute__((noinline)) template function. https://godbolt.org/z/Pt0SCC.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    There is admittedly a blurry line between a "missed" optimization and some transformation that's never likely to occur with current compiler technology, but I think this one falls into the latter category. One might also ask why compilers don't reorganize objects in cases where a more compact representation is possible, or perform other deep data-swizzling transformations. Compilers basically just don't. The main thing you have is scalarization, and a lot of clever looking stuff falls out of that, but I don't this something like suggested here is going to happen. – BeeOnRope Jun 08 '19 at 02:20