2

question

Is there a way to write a templated function accepting either a template value parameter (for statically known values), or either a classic function argument (for dynamic values)? I would imagine something callable like this, allowing the "static" version to be optimised by the compiler.

static_dynamic_fun<T>(a, b, default_fun<T>);   // Use function argument
static_dynamic_fun<T, default_fun<T>>(a, b);   // Use template parameter

Note: we can use C++20 (although a solution using C++11 or even C++14 would be useful to a wider audience).

context

I have a templated function whose one of the parameters is a statically known function:

  template<typename T, auto dist = default_fun<T>>
  T static_fun(const T* a, const T* b){...}

The dist parameter is heavily used (called billions of times) in several loops - having it in the template allows the compiler to inline the function (it is usually small). We also need a version of the code with a dynamic "dist" function (e.g. made from a lambda):

  template <typename T>
  using funtype = std::function<T(const T* a, const T* b)>

  template<typename T>
  T dynamic_fun(const T* a, const T* b, const funtype<T>& dist = default_fun<T>){...}

The problem is that we have a duplication of the code. I checked the assembly ouput of the above: the call is not inlined even if the default function is statically known.

I tried this, hopping that the compiler would "see" the optimisation opportunity, but to no avail.

  template<typename T, auto d = default_fun<T>>
  T static_dynamic_fun(const T* a, const T* b, const funtype<T>& dist=d){...}

I tracked the call in the ASM ("-fverbose-asm" GCC flag) - because we go through a std::function, the overhead is even greater than an indirect call to a function pointer (cost that we are happy to pay when we use something like a lambda with capture).

# /usr/include/c++/11.1.0/bits/std_function.h:560:  return _M_invoker(_M_functor, std::forward<_ArgTypes>(__args)...);
    leaq    216(%rsp), %rsi #, tmp490
    movq    %r12, %rdx  # tmp666,
    movq    %r14, %rdi  # tmp650,
    call    *552(%rsp)  # MEM[(struct function *)_922]._M_invoke

** Edit: adding a minimal reproducible example **

Header mre.hpp defining:

  • a function type, funtype
  • a function of that type, argfun
  • two "users" of argfun (see mre.cpp below for usage)
    • static_fun, using argfun through a template parameter
    • dynamic_fun, using argfun through a function argument
#include <functional>

template <typename T>
using funtype = std::function<T(const T* a, const T* b)>;

template <typename T>
T argfun(const T*a, const T*b){
    T d = a[0] - b[0];
    return d*d;
}

template <typename T, auto static_callme>
T static_fun(size_t s, const T* a, const T* b){
    T acc=0;
    for(size_t i=0; i<s; ++i){
        acc+=static_callme(a+i, b+i);
    }
    return acc;
}

template <typename T>
T dynamic_fun(size_t s, const T* a, const T* b, const funtype<T>& dynamic_callme){
    T acc=0;
    for(size_t i=0; i<s; ++i){
        acc+=dynamic_callme(a+i, b+i);
    }
    return acc;
}

File mre.cpp

#include "mre.hpp"
#include <random>
#include <vector>
#include <iostream>


int main(){
    // Init random
    std::random_device rd;
    unsigned seed = rd();
    std::mt19937_64 prng(seed);

    // Random size for vectors
    size_t size = std::uniform_int_distribution<std::size_t>(10, 35)(prng);

    // Random vectors a and b of doubles [0, 1[
    std::uniform_real_distribution<double> udist{0.0, 1.0};
    auto generator = [&prng, &udist]() { return udist(prng); };
    std::vector<double> a(size);
    std::generate(a.begin(), a.end(), generator);
    std::vector<double> b(size);
    std::generate(b.begin(), b.end(), generator);

    // Static call
    double res_sta = static_fun<double, argfun<double>>(size, a.data(), b.data());
    std::cout << "size = " << size << " static call = " << res_sta << std::endl;

    // Dynamic call
    double res_dyn = dynamic_fun(size, a.data(), b.data(), argfun<double>);
    std::cout << "size = " << size << " dynamic call = " << res_dyn << std::endl;

    // Just to be sure:
    if(res_sta != res_dyn){
        std::cerr << "Error!" << std::endl;
        return 1;
    } else {
        std::cout << "Ok" << std::endl;
    }

    return 0;
}

I compile this with g++ -std=c++20 mre.cpp -O3 -S -fverbose-asm

In the asm file mre.s, search for

  • static_callme: see that the call to argfun is inline
  • dynamic_callme: see that the call goes through std::function

I would like to have both behaviours without code duplication. The bodies of static_fun and dynamic_fun are the same.

Mat
  • 63
  • 6
  • 1
    It would be useful having a [mre] here. I'm finding the narrative hard to follow. – Paul Sanders Aug 17 '21 at 00:37
  • @PaulSanders Thank you for your comment; I added a MRE. – Mat Aug 17 '21 at 01:22
  • Try https://stackoverflow.com/a/39087660/1774667 to replace std function in dynamic version, then have static call dynamic, and -O3 it. function view is a touch more transparent to compilers than std function is in my experience. – Yakk - Adam Nevraumont Aug 17 '21 at 02:38
  • @Yakk-AdamNevraumont Thanks! I tried, and it indeed does generate a "lighter" ASM code. However, the function `argfun` remains called and not inlined. – Mat Aug 17 '21 at 03:03

1 Answers1

1

As you already use template, just use typename F instead of std::function<>:

template <typename T, typename F>
T static_dynamic_fun(size_t s, const T* a, const T* b, F f)
{
    T acc = 0;
    for (size_t i = 0; i != s; ++i) {
        acc += f(a + i, b + i);
    }
    return acc;
}
// and possibly, for default
template <typename T>
T static_dynamic_fun(size_t s, const T* a, const T* b)
{
    return static_dynamic_fun(s, a, b, default_f<T>{});
}

Note: You might prefer passing lambda over function pointer, easier to inline.

Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • Indeed, thank you! It is a (maybe not too) surprising that being less specific is actually better here - I guess that "let the compiler figure it out" is the way to go. Is the compiler considering that any 'f' is of its own type F? I tried using several calls with different function sharing the same signature, and the code is always specialized. – Mat Aug 17 '21 at 22:54