1

At compile-time, I want to pass a (stateless) function as efficiently as possible. I see two options:

template<typename Functor>
bool PassAsArgument(int arg1, int arg2, Functor functor);

and

template<bool (*Function)(int, int)>
bool PassAsTemplateParameter(int arg1, int arg2);

The second option seemed like the obvious choice for compile-time computation, however, Effective STL #39, the C++ standard library (std::sort, std::make_heap), the C++ Core Guidelines (T.40), and this stackoverflow answer disagree. I tested the both versions on godbolt and found that the optimizations I wanted were applied (i.e., std:less<int> was inlined).

#include <functional>

template<typename Functor>
bool PassAsArgument(int arg1, int arg2, Functor functor){
    return functor(arg1, arg2);
}

template bool PassAsArgument(int, int, std::less<int>);

results in

bool PassAsArgument<std::less<int> >(int, int, std::less<int>):
        cmp     edi, esi
        setl    al
        ret

and

inline bool LessThan(int a, int b){
    return a < b;
}

template<bool (*Function)(int, int)>
bool PassAsTemplateParameter(int arg1, int arg2){
    return Function(arg1, arg2);
}

template bool PassAsTemplateParameter<LessThan>(int, int);

results in

bool PassAsTemplateParameter<&(LessThan(int, int))>(int, int):
        cmp     edi, esi
        setl    al
        ret

I want to follow the guidelines, but I'm worried that in more complicated code, I may make a mistake that disables some optimizations. For example,

  1. What if Functor is not the last argument?
  2. What if I call PassAsArgument with a stateful Functor?
  3. What if I need to pass multiple Functors?
  4. What if I need to pass Functor to func2 in the body of PassAsArgument?
  5. What if I save a copy of Functor as a data member of a class?

As a starting point, I need to know what optimizations the compiler is using. The question is: What optimization allows Functors passed-by-value to be inlined?

mbang
  • 855
  • 2
  • 9
  • 1
    There is no specific optimization that allows it. There is the "as-if rule" principal which allows any optimization that doesn't alter the observable behavior. C++ code describes a behavior, it does not describe steps that the CPU executes. The compiler must generate assembly that has that behavior. It can choose to generate assembly using any optimization it wants, as long at is respects the original requirement that the observable behavior is unchanged. The results of your test may vary depending on the compiler and flags you try. – François Andrieux Oct 24 '22 at 13:27
  • @FrançoisAndrieux are questions about compiler optimizations off-topic on stack-overflow? I don't think so. – mbang Oct 24 '22 at 13:33
  • Questions about compiler optimizations are not off-topic. But it is site policy that when a question is substantially answered by another existing question to close one of them and to link it to the duplicate. If you think the duplicate is inaccurate, you should post a comment explaining why the duplicate doesn't answer your question. – François Andrieux Oct 24 '22 at 13:36
  • @FrançoisAndrieux **someone:** "What compiler optimization allows us to return by value?" **you:** "the as-if rule" **everyone else:** "RVO." A question about how RVO works would be on-topic for stackoverflow. It would not be considered a duplicate of a question about the "as-if" rule. My post describes a pretty well-defined optimization that should come with clearly defined limitations. – mbang Oct 24 '22 at 13:40
  • 3
    The T.40 rule you linked (*In general, passing function objects gives better performance than passing pointers to functions.*), about using lambdas instead of raw function pointers, is only true when things inline. They do inline just as easily as function pointers when visible at compile time, with constant-propagation of the functor or function pointer, and then inlining. And other parts of the as-if rule in general. – Peter Cordes Oct 24 '22 at 13:54
  • 2
    The ISO C++ standard doesn't need to give special permission since the type being returned doesn't have a copy-constructor to optimize away; no visible effect. IDK if that makes it a duplicate, though, @FrançoisAndrieux, since this question is asking about the details of when this optimization can happen in practice. Their list of options clearly shows a lack of understanding of which factors are actually important (constant propagation). – Peter Cordes Oct 24 '22 at 13:57
  • 1
    "What optimization allows Functors passed-by-value to be inlined?" The fact that the definition is available at the call site *allows* it to be inlined. The compilers heuristics of what "better" output looks like is the determination of *if* it will be inlined – Caleth Oct 24 '22 at 13:59
  • 1
    @PeterCordes Thanks! I had not heard of constant propagation. I think the answer to this question ("What optimization?") is just "constant propagation", and then I may ask a follow question in a different post on the limits of constant propagation (related to 1-5 in the OP), if I can't find my answers elsewhere. – mbang Oct 24 '22 at 14:02
  • 1
    Related: [C++ Lambda Overhead](https://stackoverflow.com/q/69319320) - Jérôme Richard says at least some compilers might fail to inline if you wrap a lambda in a `std::function`. I'd assume that's based on practical experience looking at the asm, although I'm surprised; I'd expect GCC and clang at least to not have a problem inlining. But I'd easily believe MSVC did a worse job. Still looking for a better duplicate about constprop and inlining. – Peter Cordes Oct 24 '22 at 14:03

1 Answers1

2

Constant propagation is the key optimization.

When the lambda or functor is called, the compiler still knows which code runs, and thus it's as straightforward as inlining a function by name. If not, it's like a runtime-variable function pointer; the compiler can't inline1.

After inlining PassAsArgument(1, 2, LessThan) into the caller that passed functor = LessThan, the functor(arg1, arg2); can be seen to be calling LessThan. So the compiler can inline your LessThan as well.

The T.40 rule you linked (In general, passing function objects gives better performance than passing pointers to functions), about using lambdas instead of raw function pointers, is only true when things inline. They do inline just as easily as function pointers when visible at compile time. But if they don't inline, a bare function pointer may be cheapest.

It doesn't need any special permission from the ISO C++ standard beyond the as-if rule since there's no change in observable behaviour.

What if I save a copy of Functor as a data member of a class?

That's likely to defeat constant-propagation, unless the compiler can prove that every member of that class could only ever had the same value assigned (this lambda, not some other in a different compilation unit it can't see).


Footnote 1:

If a compiler doesn't know the value of a function pointer for sure, but it has a good guess, it might compare/branch on the function pointer having that value. If it matches, it can run an inlined version of that function, with constant propagation and whatnot. Otherwise just call through the function pointer. You could call this speculative inlining.

When the function pointer in question is in a vtable, this is called "speculative devirtualization". AFAIK, that's the only time real compilers will actually attempt this, since that's when it's common to have function pointers that only ever have one value.

(Constant propagation into the function being inlined can be great if it turns runtime branches into if(false), or otherwise greatly simplifies way beyond just avoiding the call/ret and calling convention overhead.)


Related:

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847