7

I'm programming with C++ lambdas. For performance reason, I want to make sure that calling to a lambda is inlined by the compiler. For example I have this simplified piece of code:

template <typename T>
auto gen_fn1(T x1, T x2) {
    auto fn1 = [x1, x2]() {
        return x1 + x2;
    };
    return fn1;
}

template <typename T>
auto gen_fn2(T x1, T x2) {
    auto fn2 = [x1, x2]() {
        auto fn1 = gen_fn1(x1, x2);
        return fn1() * fn1();
    };
    return fn2;
}

int test_1() {
    auto fn2 = gen_fn2(1, 2);
    return fn2();
}

I want to make sure there is no extra cost introduced by the lambda generation and invocation in test_1(). I can manually check the assembly code generated by the compile. With '-O2' optimization of clang++8, I can see the desired result: pretty much just a 'return 9' in generated code. So my question is: is there a way to automatically check that I can always get the desired result? In particular, I want to check:

  1. No method invocation for generating the lambdas in 'test_1()', including 'gen_fn2()' and 'gen_fn1()'.
  2. No lambda invocation cost in 'test_1()' or 'gen_fn2()', like 'fn1()' and 'fn2()'. I expect they can be inlined. So how to identify them and check they are inlined?

Question 2 is more interesting to me.   Be able to check it in a programmable way is most appreciated, like 'assert(gen_fn2(1, 2) == ()[]{ return 9; }'. If not possible, check the intermediate file of the compiler is also helpful, or the assembly file. But how?

Long
  • 247
  • 1
  • 7
  • Lambdas are generated during compile-time, those are just functional objects on steroids. So no, no method invocation in runtime for generating lambdas whatsoever. – vtronko Apr 11 '20 at 05:37
  • @vtronko OK. I should have asked more properly. So if 'gen_fn1()' and 'gen_fn2()' are inlined, the compile-time generated functional objects will be directly assigned to the variables holding their return values, right? So I'd like 'gen_fn1()' and 'gen_fn2()' are guaranteed to be inlined. – Long Apr 11 '20 at 07:23
  • Performance is a tricky thing. The fact that you are asking this question suggests that you don't even know if you shall care about inlining of those functions or not. Much better approach is to profile your code first and then focus on performance bottlenecks you identify while profiling. Only if your profiling points you to areas of code in which those lambdas are used, you will be able to play with code and compilation options to see if you can get better result (and it may even lead you to prevention of inlining to get better performance in some cases). – Slimak Apr 11 '20 at 09:47
  • Maybe I should use "for some reason" instead of "for performance reason". – Long Apr 11 '20 at 12:40

3 Answers3

3

TL;DR: Not without looking at the compilation output.

First, as other answers point out, C++ lambdas are basically anonymous classes with an operator() method; so, your question is no different than "is there a way to check that a certain invocation of an object's method gets inlined?"

Whether your method invocation is inlined or not is a choice of the compiler, and is not mandated by the language specification (although in some cases it's impossible to inline). This fact is therefore not represented in the language itself (nor by compiler extensions of the language).

What you can do is one of two things:

  • Externally examine the compilation output (the easiest way is by compiling without assembling, e.g. gcc -S or clang++ -S, plus whatever optimization flags and other compilation options. Bear in mind, though, that even if inlining has not happened during compilation, it may still theoretically occur at link-time.
  • Internally, try to determine side-effects of the inlining choice. For example, you could have a function which gets the address of a function you want to check; then you read - at run-time - the instructions of that function, to see whether it has any function calls, look up the called addresses in the symbol table, and see whether the symbol name comes from some lambda. This is already rather difficult, error-prone, platform-specific and brittle - and there's the fact that you might have two lambda used in the same function. So I obviously wouldn't recommend doing something like that.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • If not possible without looking at the compilation output, what I want is an easy approach for what you've mentioned as 'internally'. – Long Apr 11 '20 at 09:54
  • @Long: But there is no easy approach to this... perhaps GCC or clang people know someone who has written such a runtime self-introspection mechanism. I doubt anyone has. – einpoklum Apr 11 '20 at 10:09
  • It's extremely important here to also mention that the compilation should be optimized with `-O3` (max) as well, since optimization of the code is what we're concerned with here anyways... Also, it's helpful to insert assembly comments in the code so that you can more easily keep track of what c++ belongs to what assembly. For that, see here: https://stackoverflow.com/a/67717002/1599699 – Andrew May 31 '21 at 10:22
  • @Andrew: See edit re the compilation flags. – einpoklum May 31 '21 at 12:16
1

First of all lambda expressions is not actually a function. It's a class. The compiler has written a class for every lambda expression you can see that with using typeid() operator

auto temp = []() {
return true;
};
std::cout << typeid(temp).name() << "\n";

[] -> capture clause, the compiler writes a private data member to class for every capture clause member. () -> parameters, compiler overloading the operator call function for class, and write something like this for this code.

class Temp12343786 {
public:
auto operator()() {
return true;
}
};

and as you can see, this is an inline function for CLASS.

  • 1
    This is good information for somebody new to using lambdas, but I'm not sure it answers the question. – Stephen Newell Apr 11 '20 at 06:03
  • Not really answered my question. I'd like to know whether the operator()() call is inlined or not, at least for my certain build. – Long Apr 11 '20 at 09:40
  • 1
    I think the op is referring to `inline` as in the compiler will paste code to the call site rather than implementation existing in class scope. C++ actually has an `inline` keyword but it's only a suggestion that the compiler is free to ignore. – George Apr 11 '20 at 09:41
1

If something has been inlined or not (whatever that means exactly) of course can be detected only by looking at the generated code. For example with g++ you could compile with -S and then grep over what you are looking for in the generated assembly source.

However if you really care about performance the you need to look at performance and not to inlining.

Sometimes inlining is a bad choice because may trash branch prediction or code cache; if you want to know if the code is fast you should not look at the code itself, but measure its speed on real data. As a general rule inlining a big function called in many places is a bad idea, but truth can only be found by actually measuring the speed.

Unfortunately CPUs are today so complex that the execution speed despite being formally deterministic is from a practical point of view more of a black box that must be studied experimentally. Moreover what is faster and what is slower depends on the exact CPU model and the exact machine setup (that it's why for some time critical operations there are OSes that at boot time try different alternatives to measure what is the best approach on the specific computer).

6502
  • 112,025
  • 15
  • 165
  • 265
  • As you said, measuring the execution time is not deterministic. So I'd like to monitor the code generation statically. In my case I don't want to bring in extra function calls. Otherwise I need to reimplement the code without using lambdas. – Long Apr 11 '20 at 09:47
  • @Long: then it's clear you don't want your code to be fast, you want it to be inlined even when that means for it to be slower. I can understand than the problem of writing fast code is complex... solving a different easier problem (forcing inlining) is not going to help that much however. – 6502 Apr 11 '20 at 11:12
  • I agree with you that force inlining does not mean for good performance. Sometimes leaving choices to compilers may be the best choice. But in my certain case, function call brings in unacceptable cost (image I'm programming with a processor does not support call..). So I had to be cautious. – Long Apr 11 '20 at 12:10
  • @Long: when I had the same problem recently the solution was to actually write a C code generator in python that unrolled the code the way I was thinking to. Fighting with C++ template monsters hoping that after building up the castle and letting the optimizer destroying it you end up exactly with what you want is IMO not the best path. Of course you also need to measure it later but I can confirm that "just letting the optimizer do the magic" does not always work the best (even after profile guided optimization). – 6502 Apr 11 '20 at 12:17