C++11 Performance: Lambda inlining vs Function template specialization

Question

My question is to expanding on this: Why can lambdas be better optimized by the compiler than plain functions?

To reiterate, the conclusion is that lambdas create different specializations which compilers can trivially inline, whereas function pointers are not as easy to inline since there is a single specialization for a set of function prototypes. Considering that, would function pointer templates as-fast-as/faster lambdas?

int add(int a, int b) { return a + b; }
int sub(int a, int b) { return a - b; }

template <class F>
int operate(int a, int b, F func)
{
    return func(a, b);
}

template <int func(int, int)>
int operateFuncTemplate(int a, int b)
{
    return func(a, b);
}

int main()
{
    // hard to inline (can't determine statically if operate's f is add or sub since its just a function pointer)
    auto addWithFuncP = operate(1, 2, add);
    auto subWithFuncP = operate(1, 2, sub);

    // easy to inline (lambdas are unique so 2 specializations made, each easy to inline)
    auto addWithLamda = operate(1, 2, [](int a, int b) { return a + b; });
    auto subWithLamda = operate(1, 2, [](int a, int b) { return a - b; });

    // also easy to inline? specialization means there are 2 made, instead of just 1 function definition with indirection?
    auto addWithFuncT = operateFuncTemplate<add>(1, 2);
    auto subWithFuncT = operateFuncTemplate<sub>(1, 2);
}

So if I could rank these on a scale of performance then:

operatorFuncTemplate >= operate<LAMBDA> >= operate<FUNCTIONPTR>

Are there instances where this relation could fail in non-trivial examples?

@OneManMonkeySquad I've looked at the disassembly in MSVC++, but everything precompiles on release, and I'm not skilled enough yet to understand exactly where inlining occurs in debug mode. Additionally I don't know how this problem scales up with actual functions and lambdas. — Michael Choi, Feb 08 '19 at 18:22

Yakk - Adam Nevraumont · Accepted Answer · 2019-02-08T18:40:54.290

7

If the compiler can track "this function pointer points to this function", the compiler can inline the call through the function pointer.

Sometimes compilers can do this. Sometimes they cannot.

Unless you store a lambda in a function pointer, std::function, or similar type-erasing wrapper, the compiler at the point where the lambda is called knows the type of the lambda, so knows the body of the lambda. The compiler can trivially inline the function call.

Nothing about using a function template changes this, except if the argument is constexpr like a function non-type template parameter:

template <int func(int, int)>

this is an example of that. Here, the function template, in the body of the function, is guaranteed to be known at compile time.

Pass that func anywhere else, however, and the compiler can lose track of it.

In any case, any speed difference is going to be highly context dependent. And sometimes the larger binary size caused by inlining of a lambda will cause more slowdown than the inability to inline a function pointer, so performance can go the other way.

Any universal claims like you are trying to make is going to be wrong sometimes.

edited Feb 08 '19 at 18:40

answered Feb 08 '19 at 18:21

Yakk - Adam Nevraumont

262,606
27
330
524

Ok, I appreciate the insight on bloating the binary size. My thought process with the function template was that there is no indirection involved. When I used operateFuncTemplate, add is not an indirect function call unlike the earlier operate(1,2,add) call. – Michael Choi Feb 08 '19 at 18:27
Additionally what do you think if I made a generalization then? As in, **for the most part**, in terms of speed, `operatorFuncTemplate` >= `operate` >= `operate` (but of course not always, and requires actual performance testing). I'd like to hear your opinion. – Michael Choi Feb 08 '19 at 18:28
1

addWithFuncP and addWithLamdaP are definitely the same. – One Man Monkey Squad Feb 08 '19 at 18:32
Hmm actually after rereading the answers. I think I will redact my generalization altogether. I've decided this because with trivial examples, (I think) compilers will likely inline all cases, and with larger examples with multiple function specializations different issues like binary size could take place. – Michael Choi Feb 08 '19 at 18:37
1

@MichaelChoi I added some paragraphs in the middle -- a non-type template parameter is indeed as easy to inline as a lambda. I had misread part of the question. – Yakk - Adam Nevraumont Feb 08 '19 at 18:41

C++11 Performance: Lambda inlining vs Function template specialization

1 Answers1