9

I would like to use templates for optimization as described here. However, with a growing number of bool template arguments, instantiating the template might have too many branches. And it gets even more branchy if you use larger enums instead of bools.

#include <iostream>
using namespace std;

template <bool b1, bool b2>
int HeavyLoop_impl(int arg)
{
    for (int i = 0; i < 10000000; i++)
    {
        // b1 is known at compile-time, so this branch will be eliminated
        if (b1) { arg += 1; }
        else    { arg += 2; }

        // b2 is known at compile-time, so this branch will be eliminated
        if (b2) { arg += 10; }
        else    { arg += 20; }
    }
    return arg;
}

// This function could be generated automatically
void HeavyLoop(bool b1, bool b2, int arg)
{
    int res;
    if (b1) {
        if (b2) { res = HeavyLoop_impl<true, true>(arg); }
        else    { res = HeavyLoop_impl<true, false>(arg); }
    } else {
        if (b2) { res = HeavyLoop_impl<false, true>(arg); }
        else    { res = HeavyLoop_impl<false, false>(arg); }
    }
    cout << "res: "<<res<<endl;
}

int main(int argc, char**argv)
{
    bool b1 = true;
    bool b2 = false;
    int arg = 0;
    HeavyLoop(b1, b2, arg);
    return 0;
}

Is there any way to automatically generate the HeavyLoop function? I would like something like this:

vars_to_template_function<bool, bool>(HeavyLoop_impl, b1, b2, arg);

Would that be possible somehow? Thanks for any hints.

Note: this is only a very simplified example. The actual loop is is of course more complicated :o)

Community
  • 1
  • 1
Michal Fapso
  • 1,242
  • 1
  • 15
  • 21
  • 1
    There are several disturbing things in your question. First b1 and b2 have hardcoded values in main, so even if non template, the code is going to be optimized away and the right branched are going to be called directly (assuming you have a descent compiler). On the opposite, if b1 and b2 in main were dynamic values, then using template function would have absolutely no advantage, since branching is done anyway in HeavyLoop... – log0 Oct 07 '13 at 10:03
  • This doesn't really match the link since it has totally different code depending on the flags. – doctorlove Oct 07 '13 at 10:05
  • 2
    @log0 Except that in `HeavyLoop`, the branching is done of *template arguments,* which are known at compile time, so no branching will actually occur. That's the whole point of the question - transform the runtime knowledge (two bools) into compile-time (selection from one of 4 functions with branches hard-coded). – Angew is no longer proud of SO Oct 07 '13 at 10:06
  • It should be possible to generate such a "driver" function with [Boost.preprocessor](http://www.boost.org/doc/libs/1_54_0/libs/preprocessor/doc/index.html), but I don't have time to try this myself now, sorry. – Angew is no longer proud of SO Oct 07 '13 at 10:29
  • 1
    Is `HeavyLoop_impl` so heavy that the compiler refuses to inline it into `main`? You shouldn't really have to do any of this, the optimizer will know the values of `b1` and `b2` *for that call* once the call is inlined, since the arguments' values are known at compile time (it's a trivial bit of data-flow analysis to show that they aren't modified after initialization, and that the initialization is from a compile-time constant). Then it can eliminate dead code from the code inlined into `main`. – Steve Jessop Oct 07 '13 at 11:13
  • Thanks for all your hints. The variables b1 and b2 are really dynamic = not known at compile time. Therefore I actually need to the compiler to create 4 different implementation of the HeavyLoop_impl, one for each combination of b1 and b2 values. The optimizer will automatically throw away the unused branches in HeavyLoop_impl. – Michal Fapso Oct 07 '13 at 11:30
  • @log0 Whoops, I mixed up the functions. The idea is that branching occurs once in `HeavyLoop`, which will then delegate to the correct branchless `HeavyLoop_impl`, inside which the actual "heavy loop" happens. – Angew is no longer proud of SO Oct 07 '13 at 11:40
  • What is the actual application of this code? What is the set of actual concrete functions that you want to use this for? Can you give us a concrete example? – willj Oct 07 '13 at 15:32
  • @Angew It took me a while but I finally got it right. Of course HeavyLoop avoid multiple branching in _Impl with one test... – log0 Oct 09 '13 at 14:12

6 Answers6

7

I decided to have some more fun with the code, here's an improved version over my first attempt which has the following benefits:

  • Supports enum types
  • Explicitly specify how many parameters should be converted
  • Generic implementation for the complicated part, one small helper for each function that uses it.

The code:

#include <iostream>
#include <utility>
#include <type_traits>

// an enum we would like to support
enum class tribool { FALSE, TRUE, FILE_NOT_FOUND };

// declare basic generic template
// (independent of a specific function you'd like to call)
template< template< class > class CB, std::size_t N, typename = std::tuple<> >
struct var_to_template;

// register types that should be supported
template< template< class > class CB, std::size_t N, typename... Cs >
struct var_to_template< CB, N, std::tuple< Cs... > >
{
    // bool is pretty simple, there are only two values
    template< typename R, typename... Args >
    static R impl( bool b, Args&&... args )
    {
        return b
          ? var_to_template< CB, N-1, std::tuple< Cs..., std::true_type > >::template impl< R >( std::forward< Args >( args )... )
          : var_to_template< CB, N-1, std::tuple< Cs..., std::false_type > >::template impl< R >( std::forward< Args >( args )... );
    }

    // for each enum, you need to register all its values
    template< typename R, typename... Args >
    static R impl( tribool tb, Args&&... args )
    {
        switch( tb ) {
        case tribool::FALSE:
          return var_to_template< CB, N-1, std::tuple< Cs..., std::integral_constant< tribool, tribool::FALSE > > >::template impl< R >( std::forward< Args >( args )... );
        case tribool::TRUE:
          return var_to_template< CB, N-1, std::tuple< Cs..., std::integral_constant< tribool, tribool::TRUE > > >::template impl< R >( std::forward< Args >( args )... );
        case tribool::FILE_NOT_FOUND:
          return var_to_template< CB, N-1, std::tuple< Cs..., std::integral_constant< tribool, tribool::FILE_NOT_FOUND > > >::template impl< R >( std::forward< Args >( args )... );
        }
        throw "unreachable";
    }

    // in theory you could also add int, long, ... but
    // you'd have to switch on every possible value that you want to support!
};

// terminate the recursion
template< template< class > class CB, typename... Cs >
struct var_to_template< CB, 0, std::tuple< Cs... > >
{
    template< typename R, typename... Args >
    static R impl( Args&&... args )
    {
        return CB< std::tuple< Cs... > >::template impl< R >( std::forward< Args >( args )... );
    }
};

// here's your function with the template parameters
template< bool B, tribool TB >
int HeavyLoop_impl( int arg )
{
    for( int i = 0; i < 10000000; i++ ) {
        arg += B ? 1 : 2;
        arg += ( TB == tribool::TRUE ) ? 10 : ( TB == tribool::FALSE ) ? 20 : 30;
    }
    return arg;
}

// a helper class, required once per function that you'd like to forward
template< typename > struct HeavyLoop_callback;
template< typename... Cs >
struct HeavyLoop_callback< std::tuple< Cs... > >
{
    template< typename R, typename... Args >
    static R impl( Args&&... args )
    {
        return HeavyLoop_impl< Cs::value... >( std::forward< Args >( args )... );
    }
};

// and here, everything comes together:
int HeavyLoop( bool b, tribool tb, int arg )
{
    // you provide the helper and the number of arguments
    // that should be converted to var_to_template<>
    // and you provide the return type to impl<>
    return var_to_template< HeavyLoop_callback, 2 >::impl< int >( b, tb, arg );
}

int main()
{
    bool b = true;
    tribool tb = tribool::FALSE;
    int arg = 0;
    int res = HeavyLoop( b, tb, arg );
    std::cout << "res: " << res << std::endl;
    return 0;
}

And here's a live example in case you want to play with it.

Daniel Frey
  • 55,810
  • 13
  • 122
  • 180
  • @MichalFapso I updated my answer with a much improved version, hope you like it. :) – Daniel Frey Oct 07 '13 at 23:11
  • This construct is ingenious. But how would I modify it to expand type template parameters? Currently, it only works with non-type template parameters. – rohitsan Feb 13 '19 at 21:17
  • @rohitsan Thank you! For type parameters: How would you specify them as runtime *values*? You can't call `HeavyLoop( b, tb, int );`. – Daniel Frey Feb 13 '19 at 21:46
  • As an example, I would like to specify type template parameters in the definition of HeavyLoop_impl. I am trying to map a set of enum values to a set of type template parameters by using a separate impl registration static function in var_to_template. When I tried that, it didn't compile. But I also want the expansion of non-type template parameters like bool. In my case, I am passing the type template parameter as a parameter to a template function that is called inside of my equivalent of HeavyLoop_impl. I hope this makes sense. – rohitsan Feb 13 '19 at 21:51
  • Or is there some other way I can pass the type template parameter to my version of Heavyloop_impl without using var_to_template? – rohitsan Feb 13 '19 at 21:52
  • You could just pass them as additional template parameters. Of course all functions need to be adapted, maybe there is a smarter solution, but here's a modified example: http://coliru.stacked-crooked.com/a/0d056db918656c4f – Daniel Frey Feb 13 '19 at 22:15
  • Thanks! I elected to change my callback class implementation to return HeavyLoop_impl< typename Cs... >( std::forward< Args >( args )... ); and then for all the bools in int HeavyLoop_impl( int arg ), I just added the ::value suffix to each instance. – rohitsan Feb 13 '19 at 22:44
  • It would be great to not have to make the registration functions members of var_to_template. Instead, if there was a way for users to supply those as specializations of var_to_template (like the recursion terminating case), then it would be much less intrusive (e.g. people wouldn't have to include application specific headers in the var_to_template header. – rohitsan Feb 13 '19 at 22:47
  • Please explain how you can simply write var_to_template< CB, N-1, std::tuple< Cs..., std::true_type > >:://. In other words, how can you build up a tuple of types during a compile time recursion by simply specifying the parameter pack followed by a concrete type like std::true_type? I am just trying to understand the C++ language theoretic reason for why this is possible. – rohitsan Feb 19 '19 at 06:58
  • 1
    Cs... is expanded into a template parameter list, then another type std::true_type is passed as a parameter. Those are the types forming std::tuple. – Daniel Frey Feb 19 '19 at 08:47
  • Thank you! I have one more follow up (sorry): In the var_to_template declaration, the third template argument is defaulted to the empty tuple. However, in the var_to_template specializations, that argument is specified as a parameter pack and the tuple is actually used the specialization specification. This seems inconsistent from a language syntactical standpoint. Please help me understand this. – rohitsan Feb 19 '19 at 14:58
  • 1
    The default is used when `HeavyLoop` is calling `var_to_template` with only two parameters, the other uses specify a tuple with more arguments. And the specialisations are independent of the default. – Daniel Frey Feb 19 '19 at 15:18
  • Well, that part I understood. Why isn't the tuple specified in the template argument list itself (in the specializations)? Why do we only specify the parameter pack there? – rohitsan Feb 19 '19 at 16:12
  • I'm not sure I understand your problem. The implementation needs the parameter pack, so it needs to be deduced. And that's the syntax that C++ uses for this. – Daniel Frey Feb 19 '19 at 16:26
  • My confusion stems from the fact that we specify a default empty tuple in the template declaration but a parameter pack in the same template argument slot in the specialization definitions. That seems inconsistent - I would have expected that the tuple would be specified there as well (consistent with the template struct declaration). I assume (perhaps incorrectly) that the template parameters in the declaration and specializations must be consistent. – rohitsan Feb 19 '19 at 16:29
  • 1
    You are missing an intermediate step. The template declaration's list must be matched by the specialization's list *after* the template class name. The specialization's template parameters have nothing to do with the template class declaration's parameters except that they must be deduced from the specialization's types. So: `template< LIST1 > struct A { ... };` and `template< LIST2 > struct A< MATCH1 > { ... };` - here, MATCH1 must be accepted by LIST1, while LIST2 is deduced from MATCH1. – Daniel Frey Feb 19 '19 at 16:43
  • I am too stupid, but what parts are generic, and what parts I have to write by myself? – Bogi Mar 23 '21 at 09:39
  • @Bogi Mostly everything is generic, except the part with `// for each enum, you need to register all its values` and `// a helper class, required once per function that you'd like to forward`. – Daniel Frey Mar 23 '21 at 19:39
4

Here's how you can do it:

#include <iostream>
using namespace std;

template <bool b1, bool b2>
struct HeavyLoopImpl
{
    static int func(int arg)
    {
        for (int i = 0; i < 10000000; i++) {
            arg += b1 ? 1 : 2;
            arg += b2 ? 10 : 20;
        }
        return arg;
    }
};

template <template<bool...> class Impl,bool...Bs>
struct GenericJump
{
    template<typename... Args>
    static int impl(Args&&... args)
    {
        return Impl<Bs...>::func(std::forward<Args>(args)...);
    }

    template<typename... Args>
    static int impl(bool b, Args&&... args)
    {
        return b
            ? GenericJump<Impl,Bs...,true >::impl(std::forward<Args>(args)...)
            : GenericJump<Impl,Bs...,false>::impl(std::forward<Args>(args)...);
    }
};

int HeavyLoop(bool b1, bool b2, int arg)
{
    return GenericJump<HeavyLoopImpl>::impl(b1,b2,arg);
}

int main()
{
    bool b1 = true;
    bool b2 = false;
    int arg = 0;
    int res = HeavyLoop(b1, b2, arg);
    cout << "res: "<<res<<endl;
    return 0;
}

This is basically Daniels solution, but it allows you to use functions other than HeavyLoop_impl() as implementation. Only being able to call a single template function kind of defeats the purpose of being a generic solution. The GenericJump template class can call other functions also. You only have to change the HeavyLoop_impl() template function into a template class with a static function func(). It works marvellously. It compiles with gcc 4.7.3 and gives the correct output.

Ralph Tandetzky
  • 22,780
  • 11
  • 73
  • 120
  • Thanks a lot, Daniel and Ralph! Now if the HeavyLoop is a member method of some class A, using its member variables, what would be the best way to use the HeavyLoopImpl functor? I can declare it as a friend of A, but then I have to also change the functor's code slightly adding a prefix to all member variables (a.memberInt instead of memberInt). And also declaring the template functor as a friend requires first declaring it and then declaring it second time as a friend. Is there a cleaner solution? – Michal Fapso Oct 07 '13 at 14:17
  • I believe there is not enough information about the design problem here. Different approaches have different pros and cons. But consider the following: Hide `HeavyLoopImpl` as a nested struct privately inside your class A. You can even forward declare it inside `A`, if the code becomes to bloated. Adding a prefix `a.` does not sound so bad to me. If you really want to avoid that declare a member template function in `A` and copy the functionality of `HeavyLoopImpl` in there. Then call this function from your nested functor struct. – Ralph Tandetzky Oct 07 '13 at 14:48
0

Have you considered passing a function as template argument? This way you can tailor your inner loop to your own wishes.

Function passed as template argument

However, the function call will have a small overhead.

Community
  • 1
  • 1
Enigma
  • 1,699
  • 10
  • 14
  • 1
    A _function_ cannot be inlined, so it will have overhead. But a _functor can_ and will _not_ have any overhead (provided the `operator()` is properly declared inline and it's definition visible). – Jan Hudec Oct 07 '13 at 11:23
  • @JanHudec Not true. A call through a pointer-to-function with an address known at compile time can be inlined. – willj Oct 07 '13 at 13:16
  • @willj: True, it can provided the pointer is known and etc. and especially that the compiler will bother. The functor is significantly more likely to get inlined in practice. – Jan Hudec Oct 07 '13 at 13:43
0

The ideal generic solution really depends on what you want to vary. The possibilities for variation are:

  • The number of branches, and the types of the variables being branched on.
  • The operation to be performed and the number and types of its arguments.

I would recommend not making an completely generic solution unless you really need to vary all of those things. Consider only the things you want to vary and your life will be much easier.

Assuming that the total number of branches is less than 2^64, you can use a switch statement to do the dispatch. The following solution demonstrates how this could work:

template<unsigned permutation>
struct Permutation
{
    static_assert(permutation < 4, "permutation must be in the range [0, 4)");
    static const bool b1 = permutation & (1 << 0);
    static const bool b2 = permutation & (1 << 1);
};

unsigned makePermutation(bool b1, bool b2)
{
    return (b1 << 0) | (b2 << 1);
}

template<unsigned p>
int HeavyLoop_impl(int arg)
{
    return HeavyLoop_impl<Permutation<p>::b1, Permutation<p>::b2>(arg);
}

int HeavyLoop_impl(unsigned permutation, int arg)
{
    switch(permutation)
    {
    case 0: return HeavyLoop_impl<0>(arg);
    case 1: return HeavyLoop_impl<1>(arg);
    case 2: return HeavyLoop_impl<2>(arg);
    case 3: return HeavyLoop_impl<3>(arg);
    }
}

[Note: It would be trivial to use Boost.Preprocessor to generate the above switch statement.]

void HeavyLoop(bool b1, bool b2, int arg)
{
    int res = HeavyLoop_impl(makePermutation(b1, b2), arg);
    cout << "res: "<<res<<endl;
}
willj
  • 2,991
  • 12
  • 24
0

I think the best answer to your question is actually not to generate it automatically and leave it how you already have it in the question.

Making an automatic template function to generate the middle ground just obfuscates the invariant switching you're doing in the first place.

I much prefer to try to understand how the middle layer works in your question, than any of the answers people have offered for you.

I have a similar example. In my case I can apply a number of different operations between an array of values. The arrays are equal size. However I also have a structure that maps sub-ranges of the array with weight values that affect my operations. So for instance, I might be working with arrays of 100 values, and have ranges with weights like this:

[0,25] rangeWeight = 0
[26,35] rangeWeight = 0.25
[36,50] rangeWeight = 0.5
[51,99] rangeWeight = 1.0

So each operation looks something like (pseudo):

for each subrange:
    alias to the dst buffer
    alias to the src buffer
    determine the number of elements in the range
    if there's any
        weight = weightPassedIn * rangeWeight;

        Op(dst, src, weight, numElements);

For me, there were several optimizations involving whether or not the destination is touched or not (if it was still at the cleared value, some assumptions can be made to simplify the math per operation), Also if the weight happens to be full, 1.0, there are other shortcuts.

At first I was writing the same loop over and over with all of the same setup, and once I refactored all the loops around each op into functions, I pretty much naturally had the form of your invariant wrapper. There actually turned out to be several wrappers which mainly just wrap the high level operation happening inside the loop, and otherwise just handle the individual optimizations like this:

if (weight == 1.0f)
{
    if ( arrayIsCleared )
        Blend<BlendOpSet, true, false>(otherBuff, subRangesMask, 1.0f);
    else
        Blend<BlendOpAccumulate, true, false>(otherBuff, subRangesMask, 1.0f);
}
else
{
    if ( arrayIsCleared )
        Blend<BlendOpSet, false, false>(otherBuff, subRangesMask, weight);
    else
        Blend<BlendOpAccumulate, false, false>(otherBuff, subRangesMask, weight);
}
johnb003
  • 1,821
  • 16
  • 30
  • Sure, you're right, when you have only 2 boolean parameters, it might be more readable to do the branching by yourself. But with more variables and an enum with 10 possible values, it would become too "branchy" and much less readable. – Michal Fapso Feb 09 '14 at 08:06
  • @MichalFapso Well, in this example the wrapper is an "Accumulate" operation, there are many operations, and they all have similar wrappers but it's at least easy to follow the logic. – johnb003 Feb 09 '14 at 19:27
0

Here is another solution with boost::hana, which can handle also enums:

#include <cstdio>
#include <type_traits>
#include <boost/hana.hpp>

namespace hana = boost::hana;

template <typename F, typename TArgs, typename TIn, typename TOut>
void fun_arg_combinations_impl(F&& f, TArgs targs, TIn tin, TOut tout) {
    if constexpr (hana::is_empty(tin)) {
        hana::unpack(tout, f);
    } else {
        hana::for_each(hana::front(tin), [&](auto v){
            if (v == hana::front(targs)) {
                fun_arg_combinations_impl(f, hana::drop_front(targs), hana::drop_front(tin), hana::append(tout, v));
            }
        });
    }
}

template <typename F, typename TArgs, typename TIn>
void fun_arg_combinations(F&& f, TArgs targs, TIn tin) {
    fun_arg_combinations_impl(f, targs, tin, hana::tuple<>());
}

enum Shape {LINE, CIRCLE, SQUARE};

int main()
{
    auto f_heavy_loop = [](auto b1t, auto b2t, auto st) {
        constexpr bool b1 = decltype(b1t)::value;
        constexpr bool b2 = decltype(b2t)::value;
        constexpr Shape s = decltype(st )::value;

        printf("out:%d %d %d\n", b1, b2, s);
    };

    //constexpr auto bools = hana::make_tuple(std::true_type{}, std::false_type{});
    constexpr auto bools = hana::tuple<std::true_type, std::false_type>{};
    constexpr auto shapes = hana::tuple<
        std::integral_constant<Shape, LINE>,
        std::integral_constant<Shape, CIRCLE>,
        std::integral_constant<Shape, SQUARE>>{};

    // Using volatile to not allow the compiler to optimize for hard-coded values
    volatile bool b1 = true;
    volatile bool b2 = false;
    volatile Shape s = SQUARE;
    fun_arg_combinations(
        f_heavy_loop,
        hana::make_tuple(b1   , b2   , s     ),
        hana::make_tuple(bools, bools, shapes));
}

b1, b2 and s inside the f_heavy_loop() lambda are all constexpr, so we can use if constexpr on them.

output:

out:1 0 2

Have a look at the generated assembly here: https://godbolt.org/z/nsF2l5

Michal Fapso
  • 1,242
  • 1
  • 15
  • 21