37

EDIT: I took the "if/else" case as an example that can sometimes be resolved at compile time (eg when static values are involved, cf <type_traits>). Adapting the answers below to other types of static branching (eg, multiple branches or multi-criteria branches) should be straightforward. Note that compile-time branching using template-meta programming is not the topic here.


In a typical code like this

#include <type_traits>

template <class T>
T numeric_procedure( const T& x )
{
    if ( std::is_integral<T>::value )
    {
        // Integral types
    }
    else
    {
        // Floating point numeric types
    }
}

will the compiler optimize the if/else statement out when I define specific template types later on in my code?

A simple alternative would be to write something like this:

#include <type_traits>

template <class T>
inline T numeric_procedure( const T& x )
{
    return numeric_procedure_impl( x, std::is_integral<T>() );
}

// ------------------------------------------------------------------------

template <class T>
T numeric_procedure_impl( const T& x, std::true_type const )
{
    // Integral types
}

template <class T>
T numeric_procedure_impl( const T& x, std::false_type const )
{
    // Floating point numeric types
}

Is there a difference in terms of performance between these solutions? Is there any non-subjective grounds for saying that one is better than the other? Are there other (possibly better) solutions to deal with compile-time branching?

Jonathan H
  • 7,591
  • 5
  • 47
  • 80
  • 2
    "*Is there any non-subjective grounds for saying that one is better than the other?*" The former will likely produce a warning, the latter will not. They will compile to the same machine code in any implementation I'm aware of. – ildjarn May 31 '14 at 13:24
  • @ildjarn Thanks, I would say this sounds like an answer; would you care to elaborate a bit? – Jonathan H May 31 '14 at 13:30
  • 1
    It's a real optimization technique even for dynamic parameters that can only take a few values (bools, enums) that the compiler generates separate functions and dispatches them based on the argument. E.g. `void foo(bool b) { if (b) __foo_true(); else __foo_false(); }`. – Kerrek SB May 31 '14 at 13:32
  • @KerrekSB :) Same than for ildjarn, this sounds like an answer to me! – Jonathan H May 31 '14 at 13:47
  • We have great answers, suitable for an FAQ. But I think the question should use `if` as an example of *all* branches which may be resolved at compile-time, instead of asking only about `if`. Also, the term "static if" ought to be avoided, since it is loaded with meaning that is contrary to the actual usage here. – Ben Voigt Jun 01 '14 at 18:26
  • @BenVoigt Thanks for the comment! Please see the edited post and feel free to edit further if you want. I think the answer from TemplateRex below is outstanding. I am not sure what you mean about "static if"s though; what I mean is "compile-time branching". – Jonathan H Jun 02 '14 at 09:34

5 Answers5

55

TL;DR

There are several ways to get different run-time behavior dependent on a template parameter. Performance is usually equal, so flexibility and maintainability are the main concern. In all cases, the various thin wrappers and constant conditional expressions will all be optimized away on any decent compiler for release builds. Below a small summary with the various tradeoffs (inspired by this answer by @AndyProwl).

Run-time if

Your first solution is the simple run-time if:

template<class T>
T numeric_procedure(const T& x)
{
    if (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // must ALSO compile for integral types
    }
}

It is simple and effective: any decent compiler will optimize away the dead branch.

There are several disadvantages:

  • on some platforms (MSVC), a constant conditional expression yields a spurious compiler warning which you then need to ignore or silence.
  • But worse, on all conforming platforms, both branches of the if/else statement need to actually compile for all types T, even if one of the branches is known not to be taken. If T contains different member types depending on its nature, then you will get a compiler error as soon as you try to access them.

Tag dispatching

Your second approach is known as tag-dispatching:

template<class T>
T numeric_procedure_impl(const T& x, std::false_type)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T>
T numeric_procedure_impl(const T& x, std::true_type)
{
    // valid code for integral types
}

template<class T>
T numeric_procedure(const T& x)
{
    return numeric_procedure_impl(x, std::is_integral<T>());
}

It works fine, without run-time overhead: the temporary std::is_integral<T>() and the call to the one-line helper function will both be optimized way on any decent platform.

The main (minor IMO) disadvantage is that you have some boilerplate with 3 instead of 1 function.

SFINAE

Closely related to tag-dispatching is SFINAE (Substitution failure is not an error)

template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

This has the same effect as tag-dispatching but works slightly differently. Instead of using argument-deduction to select the proper helper overload, it directly manipulates the overload set for your main function.

The disadvantage is that it can be a fragile and tricky way if you don't know exactly what the entire overload set is (e.g. with template heavy code, ADL could pull in more overloads from associated namespaces you didn't think of). And compared to tag-dispatching, selection based on anything other than a binary decision is a lot more involved.

Partial specialization

Another approach is to use a class template helper with a function application operator and partially specialize it

template<class T, bool> 
struct numeric_functor;

template<class T>
struct numeric_functor<T, false>
{
    T operator()(T const& x) const
    {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
};

template<class T>
struct numeric_functor<T, true>
{
    T operator()(T const& x) const
    {
        // valid code for integral types
    }
};

template<class T>
T numeric_procedure(T const& x)
{
    return numeric_functor<T, std::is_integral<T>::value>()(x);
}

This is probably the most flexible approach if you want to have fine-grained control and minimal code duplication (e.g. if you also want to specialize on size and/or alignment, but say only for floating point types). The pattern matching given by partial template specialization is ideally suited for such advanced problems. As with tag-dispatching, the helper functors are optimized away by any decent compiler.

The main disadvantage is the slightly larger boiler-plate if you only want to specialize on a single binary condition.

If constexpr (C++1z proposal)

This is a reboot of failed earlier proposals for static if (which is used in the D programming language)

template<class T>
T numeric_procedure(const T& x)
{
    if constexpr (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
}

As with your run-time if, everything is in one place, but the main advantage here is that the else branch will be dropped entirely by the compiler when it is known not to be taken. A great advantage is that you keep all code local, and do not have to use little helper functions as in tag dispatching or partial template specialization.

Concepts-Lite (C++1z proposal)

Concepts-Lite is an upcoming Technical Specification that is scheduled to be part of the next major C++ release (C++1z, with z==7 as the best guess).

template<Non_integral T>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<Integral T>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

This approach replaces the class or typename keyword inside the template< > brackets with a concept name describing the family of types that the code is supposed to work for. It can be seen as a generalization of the tag-dispatching and SFINAE techniques. Some compilers (gcc, Clang) have experimental support for this feature. The Lite adjective is referring to the failed Concepts C++11 proposal.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • Wow, sorry I didn't see this earlier, that's an amazing answer! – Jonathan H Jun 02 '14 at 09:04
  • @Sh3ljohn thanks, and you couldn't have seen it earlier because I posted it yesterday :-) – TemplateRex Jun 02 '14 at 09:16
  • NOT c++11/14 or any c++ for that matter!! Wouldn't it make much more sense to do **template<... with std::is_integral()>**, since then one can make the subtle diference between **template()>** and **template()>**? (As compared to template). Moreover, one can also introduce multiple conditions to which a template argument should adhere. Great answer though. – Herbert Nov 23 '14 at 15:51
13

Note that although the optimizer may well be able to prune statically-known tests and unreachable branches from the generated code, the compiler still needs to be able to compile each branch.

That is:

int foo() {
  #if 0
    return std::cout << "this isn't going to work\n";
  #else
    return 1;
  #endif
}

will work fine, because the preprocessor strips out the dead branch before the compiler sees it, but:

int foo() {
  if (std::is_integral<double>::value) {
    return std::cout << "this isn't going to work\n";
  } else {
    return 1;
  }
}

won't. Even though the optimizer can discard the first branch, it will still fail to compile. This is where using enable_if and SFINAE help, because you can select the valid (compilable) code, and the invalid (un-compilable) code's Failure to compile Is Not An Error.

Useless
  • 64,155
  • 6
  • 88
  • 132
4

To answer the title question about how compilers handle if(false):

They optimize away constant branch conditions (and the dead code)

The language standard does not of course require compilers to not be terrible, but the C++ implementations that people actually use are non-terrible in this way. (So are most C implementations, except for maybe very simplistic non-optimizing ones like tinycc.)

One of the major reasons C++ is designed around if(something) instead of the C preprocessor's #ifdef SOMETHING is that they're equally efficient. Many C++ features (like constexpr) only got added after compilers already implemented the necessary optimizations (inlining + constant propagation). (The reason we put up with all the undefined-behaviour pitfalls and gotchas of C and C++ is performance, especially with modern compilers that aggressively optimize on the assumption of no UB. The language design typically doesn't impose unnecessary performance costs.)


But if you care about debug-mode performance, the choice can be relevant depending on your compiler. (e.g. for a game or other program with real-time requirements for a debug build to even be testable).

e.g. clang++ -O0 ("debug mode") still evaluates an if(constexpr_function()) at compile time and treats it like if(false) or if(true). Some other compilers only eval at compile-time if they're forced to (by template-matching).


There is no performance cost for if(false) with optimization enabled. (Barring missed-optimization bugs, which might depend on how early in the compile process the condition can be resolved to false and dead-code elimination can remove it before the compiler "thinks about" reserving stack space for its variables, or that the function may be non-leaf, or whatever.)

Any non-terrible compiler can optimize away dead code behind a compile-time-constant condition (Wikipedia: Dead Code Elimination). This is part of the baseline expectations people have for a C++ implementation to be usable in the real world; it's one of the most basic optimizations and all compilers in real use do it for simple cases like a constexpr.

Often constant-propagation (especially after inlining) will make conditions compile-time constants even if they weren't obviously so in the source. One of the more-obvious cases is optimizing away the compare on the first iterations of a for (int i=0 ; i<n ; i++) so it can turn into a normal asm loop with a conditional branch at the bottom (like a do{}while loop in C++) if n is constant or provably > 0. (Yes, real compilers do value-range optimizations, not just constant propagation.)


Some compilers, like gcc and clang, remove dead code inside an if(false) even in "debug" mode, at the minimum level of optimization that's required for them to transform the program logic through their internal arch-neutral representations and eventually emit asm. (But debug mode disables any kind of constant-propagation for variables that aren't declared const or constexpr in the source.)

Some compilers only do it when optimization is enabled; for example MSVC really likes to be literal in its translation of C++ to asm in debug mode and will actually create a zero in a register and branch on it being zero or not for if(false).

For gcc debug mode (-O0), constexpr functions aren't inlined if they don't have to be. (In some places the language requires a constant, like an array size inside a struct. GNU C++ supports C99 VLAs, but does choose to inline a constexpr function instead of actually making a VLA in debug mode.)

But non-function constexprs do get evaluated at compile time, not stored in memory and tested.

But just to reiterate, at any level of optimization, constexpr functions are fully inlined and optimized away, and then the if()


Examples (from the Godbolt compiler explorer)

#include <type_traits>
void baz() {
    if (std::is_integral<float>::value) f1();  // optimizes for gcc
    else f2();
}

All compilers with -O2 optimization enabled (for x86-64):

baz():
        jmp     f2()    # optimized tailcall

Debug-mode code quality, normally not relevant

GCC with optimization disabled still evaluates the expression and does dead-code elimination:

baz():
        push    rbp
        mov     rbp, rsp          # -fno-omit-frame-pointer is the default at -O0
        call    f2()              # still an unconditional call, no runtime branching
        nop
        pop     rbp
        ret

To see gcc not inline something with optimization disabled

static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}
static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}
;; gcc9.1 with no optimization chooses not to inline the constexpr function
baz():
        push    rbp
        mov     rbp, rsp
        call    always_false()
        test    al, al              # the bool return value
        je      .L9
        call    f1()
        jmp     .L11
.L9:
        call    f2()
.L11:
        nop
        pop     rbp
        ret

MSVC's braindead literal code-gen with optimization disabled:

void foo() {
    if (false) f1();
    else f2();
}
;; MSVC 19.20 x86-64  no optimization
void foo(void) PROC                                        ; foo
        sub     rsp, 40                             ; 00000028H
        xor     eax, eax                     ; EAX=0
        test    eax, eax                     ; set flags from EAX (which were already set by xor)
        je      SHORT $LN2@foo               ; jump if ZF is set, i.e. if EAX==0
        call    void f1(void)                          ; f1
        jmp     SHORT $LN3@foo
$LN2@foo:
        call    void f2(void)                          ; f2
$LN3@foo:
        add     rsp, 40                             ; 00000028H
        ret     0

Benchmarking with optimization disabled is not useful

You should always enable optimization for real code; the only time debug-mode performance matters is when that's a pre-condition for debugability. It's not a useful proxy to avoid having your benchmark optimize away; different code gains more or less from debug mode depending on how it's written.

Unless that's a really big deal for your project, and you just can't find enough info about local vars or something with minimal optimization like g++ -Og, the headline of this answer is the full answer. Ignore debug mode, only bother thinking about quality of the asm in optimized builds. (Preferably with LTO enabled, if your project can enable that to allow cross-file inlining.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

The compiler may be smart enough to see that it can replace the if statement body with two different function implementations, and just choose the right one. But as of 2014 I doubt there is any compiler that is smart enough to do that. I may be wrong though. On second thought, std::is_integral is simple enough that I think it will be optimized away.

Your idea of overloading on the result of std::is_integral is one possible solution.

Another and IMHO cleaner solution is to use std::enable_if (together with std::is_integral).

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
  • Thanks, enable-ifs and SFINAEs are two things that I rarely touch, for fault of knowing exactly how they work. But that's good to know :) – Jonathan H May 31 '14 at 13:26
  • 8
    You do realize that the function is a template (hence different `T` generate different code anyway) and for any fixed `T`, `std::is_integral::value` is a compile-time constant? Removing the branch that's not applicable should be a simple matter of inlining, constant folding and dead code elimination. In fact all template metaprogramming relies on those optimizations to be anywhere near efficient. –  May 31 '14 at 13:29
  • @delnan: re "hence different `T` generate different code anyway", no it does not generate different specializations. apparently the OP wants different code for floating point versus integral type. code for integral type, e.g. using `%`, may not even compile for floating point type. it is a mystery why in just an eyeblink 4 supporters have upvoted your comment, which seems designed to mislead and is otherwise technically meaningless. – Cheers and hth. - Alf May 31 '14 at 13:59
  • 4
    @Cheersandhth.-Alf Different `T` *do* generate different code, if they generate code at all. They might also not work, which is a separate issue (which your answer does not mention either btw). But certainly each invocation with a different `T` creates a new instantiation that is analyzed, optimized and codegen'd separately. My comment is neither misleading nor meaningless, it points out that (as you have since edited in) the code is fully optimizable as-is. –  May 31 '14 at 14:09
1

Credit to @MooingDuck and @Casey

template<class FN1, class FN2, class ...Args>
decltype(auto) if_else_impl(std::true_type, FN1 &&fn1, FN2 &&, Args&&... args)
{
    return fn1(std::forward<Args>(args)...);
}

template<class FN1, class FN2, class ...Args>
decltype(auto) if_else_impl(std::false_type, FN1 &&, FN2 &&fn2, Args&&... args)
{
    return fn2(std::forward<Args>(args)...);
}

#define static_if(...) if_else_impl(__VA_ARGS__, *this)

And ussage as simple as:

static_if(do_it,
    [&](auto& self){ return 1; },
    [&](auto& self){ return self.sum(2); }
);

Works as static if - compiler go only to "true" branch.


P.S. You need to have self = *this and make member calls from it, due to gcc bug . If you'll have nested lambda calls you can't use this-> instead of self.

tower120
  • 5,007
  • 6
  • 40
  • 88