Why would a C++ compiler fail to inline a lambda passed to a function template?

Question

My question is the inverse of Why can lambdas be better optimized by the compiler than plain functions? The accepted answer says

The reason is that lambdas are function objects so passing them to a function template will instantiate a new function specifically for that object. The compiler can thus trivially inline the lambda call.

Thus, the question is What circumstances would cause a compiler to not inline a lambda passed to an inline function template? Consider the following setup:

template <typename F>
static inline void
hof(const F fun) {
    ...
    fun(a, b, c);     // a, b, c are int.
    ...
}
void
caller() {
    ...
    hof([&](int a, int b, int c) { ... });
    ...
}

Further assume a recent gcc or clang with all relevant optimization flags turned on.

The question (which is in the form of a challenge) is to fill in the ... parts with code so that the compiler fails to inline either the call to hof or the call to fun. You may use loops to call fun multiple times or whatever (but only one call to hof).

My claim is that (excluding "funny business" like exceptions, longjmp, reflection, etc) it can't be done. Please try and prove me wrong. I'll accept any answer for which I can verify using godbolt.org that the lambda isn't inlined.

This really depends on the compiler, and there is nothing in the standard that requires things to turn out one way or another. The most you can go here with is to make general assumptions based on some reasonably sophisticated compiler; but this is still going to be not much more than an educated guess. — Sam Varshavchik, Nov 04 '19 at 01:14
"*excluding "funny business" like exceptions, longjmp, reflection, etc*" Who decides exactly what "funny business" entails? Also, saying "I declare that X is true; prove me wrong" isn't really what this platform is for. At least, not in this way. — Nicol Bolas, Nov 04 '19 at 01:15
This comes down to quality of implementation of the compiler - every compiler has different criteria for inlining a function, versus not. `inline` is generally a hint to a compiler, and the compiler is free to ignore that hint. Conversely, it may inline functions in some circumstances even if the programmer doesn't seek that. Given that by "funny business" you exclude a situation of "the developers of at least one compiler has not implemented inline in that case" since you are assuming inlining is completely determined by source code, your question is pointless. — Peter, Nov 04 '19 at 01:27
@BjörnLindqvist: "*Since I posed the question and this isn't a court of law, I decide what funny business is.*" Then your question is too opinion based, since you can declare any answer to be valid or invalid for reasons which are not known up-front. That means nobody can provide a right or wrong answer without first consulting yourself to know whether it is right or wrong. This makes the question decidedly less useful to anyone who doesn't share your opinion of what "funny business" is. — Nicol Bolas, Nov 04 '19 at 01:50
*"so that the compiler **fails** to inline"*. it is not necessary a "fail", but a decision, as inlining can also hurt performance. — Jarod42, Nov 04 '19 at 08:54

walnut · Accepted Answer · 2019-11-04T02:32:11.707

It is only a matter of filling enough stuff into the lambda and using it at least twice (otherwise there is no good reason not to inline).

Here for GCC 9.2 and Clang 9 with -O3:

#include<iostream>

int a, b, c;

template <typename F>
static inline void
hof(const F fun) {
    fun(a, b, c);
    fun(a, b, c);
}
void caller() {
    hof([&](int a, int b, int c) {
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
        std::cout << "Hello!";
    });
}

And the assembly for caller looks like this:

GCC:

caller():
        sub     rsp, 8
        call    caller()::{lambda(int, int, int)#1}::operator()(int, int, int) const [clone .isra.0]
        call    caller()::{lambda(int, int, int)#1}::operator()(int, int, int) const [clone .isra.0]
        add     rsp, 8
        ret

Clang:

caller():                             # @caller()
        push    rax
        call    caller()::$_0::operator()(int, int, int) const
        pop     rax
        jmp     caller()::$_0::operator()(int, int, int) const # TAILCALL

See godbolt here.

These are exactly as many repetitions in the lambda as I needed to convince GCC that inlining twice is not worth it.

Clang stopped inlining already with fewer repetitions.

Why would a C++ compiler fail to inline a lambda passed to a function template?

1 Answers1

Linked