4

say you have a function like:

double do_it(int m)
{
   double result = 0;

   for(int i = 0; i < m; i++)
      result += i;

   return result;
}

If you know m at compile time you can do:

template<size_t t_m>
double do_it()
{
   double result = 0;

   for(int i = 0; i < t_m; i++)
      result += i;

   return result;   
}

This gives a possibility for things like loop unrolling when optimizing. But, sometimes you might know some cases at compile-time and some at run-time. Or, perhaps you have defaults which a user could change...but it would be nice to optimize the default case.

I'm wondering if there is any way to provide both versions without basically duplicating the code or using a macro?

Note that the above is a toy example to illustrate the point.

user109078
  • 906
  • 7
  • 19
  • 3
    Unrelated: can't you simply `return m*(m-1)*0.5;` ? – JeJo Aug 06 '19 at 16:46
  • 2
    This is a toy example to illustrate the point. – user109078 Aug 06 '19 at 16:48
  • 5
    [constexpr](https://en.cppreference.com/w/cpp/language/constexpr)? – Jesper Juhl Aug 06 '19 at 16:49
  • 2
    `constexpr` is not enough. the compiler can do a lot more optimizations when it knows the code can only be called at compile time then when it could be both compile and run time. – NathanOliver Aug 06 '19 at 16:49
  • I knew someone would bring that up :) So that might work for this toy example, but not in general. – user109078 Aug 06 '19 at 16:50
  • 2
    @NathanOliver : could you kindly elaborate and/or provide references? – Daniel Kamil Kozar Aug 06 '19 at 16:51
  • 1
    @DanielKamilKozar In the template code, the compile knows exactly the loop limit, so it can unroll the loop for small ones and do something else for larger ones. In the non template code, the compile can't do that because it doesn't know what the limit is. If you make the non template code `constexpr`, it is still in the same boat because you can call a `constexpr` function at run time. – NathanOliver Aug 06 '19 at 16:52
  • 1
    @NathanOliver : A brainy compiler is simply going to emit two (or more, if needed) versions of this function - one with the constants folded into the loop's body, potentially eliminating the need for calculating anything altogether, and another one with all stuff evaluated at runtime. [The Compiler Explorer](https://gcc.godbolt.org/z/wf7DX6) seems to agree. – Daniel Kamil Kozar Aug 06 '19 at 16:57
  • I just noticed that the runtime version of the function in the compiled code even includes an early return if the runtime parameter's value is the same as the compile-time calculated one. – Daniel Kamil Kozar Aug 06 '19 at 17:07
  • @DanielKamilKozar I think if you want guaranteed compile-time evaluation (also without `-O2`), you need to assign the result to a `constexpr` before returning: https://gcc.godbolt.org/z/VGhFHn – chtz Aug 06 '19 at 17:09
  • @chtz : Worrying about performance when compiling without optimisation is kind of pointless, no? :-) (the question isn't tagged `language-lawyer`) – Daniel Kamil Kozar Aug 06 '19 at 17:10
  • @DanielKamilKozar true, but your example also does not evaluate at compile time with clang for bigger numbers even with `-O3`: https://gcc.godbolt.org/z/k71W1T – chtz Aug 06 '19 at 17:14
  • @BiagioFesta exactly my point ... – chtz Aug 06 '19 at 17:18
  • @chtz Ups... I didn't notice I didn't expand all comments :) – BiagioF Aug 06 '19 at 17:20

3 Answers3

5

In terms of the language specification, there's no general way to have a function that works in the way you desire. But that doesn't mean compilers can't do it for you.

This gives a possibility for things like loop unrolling when optimizing.

You say this as though the compiler cannot unroll the loop otherwise.

The reason the compiler can unroll the template loop is because of the confluence of the following:

  1. The compiler has the definition of the function. In this case, the function definition is provided (it's a template function, so its definition has to be provided).

  2. The compiler has the compile-time value of the loop counter. In this case, through the template parameter.

But none of these factors explicitly require a template. If the compiler has the definition of a function, and it can determine the compile-time value of the loop counter, then it has 100% of the information needed to unroll that loop.

How it gets this information is irrelevant. It could be an inline function (you have to provide the definition) which you call given a compile-time constant as an argument. It could be a constexpr function (again, you have to provide the definition) which you call given a compile-time constant as an argument.

This is a matter of quality of implementation, not of language. If compile-time parameters are to ever be a thing, it would be to support things you cannot do otherwise, not to support optimization (or at least, not compiler optimizations). For example, you can't have a function which returns a std::array whose length is specified by a regular function parameter rather than a template parameter.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • I marked the integral_constant answer as the chosen answer as it more directly answers the question, but I appreciate the insight of your answer as well and think it forms a more complete picture. – user109078 Aug 06 '19 at 18:04
3

Yes you can, with std::integral_constant. Specifically, the following function will work with an int, as well as specializations of std::integral_constant.

template<class Num>
constexpr double do_it(Num m_unconverted) {
  double result = 0.;
  int m_converted = static_cast<int>(m_unconverted);
  for(int i = 0; i < m_converted; i++){ result += i; }
  return result;
}

If you want to call do_it with a compile-time constant, then you can use

constexpr double result = do_it(std::integral_constant<int, 5>{});

Otherwise, it's just

double result = do_it(some_number);
hegel5000
  • 876
  • 1
  • 7
  • 13
  • 1
    Why use `integral_constant` here? It doesn't achieve anything, since you cast it straight to an integer. The only useful part of this answer is `constexpr` – Eric Aug 06 '19 at 17:25
  • @Eric labeling the function `constexpr` doesn't particularly matter, and is merely an added bonus. What matters is that the template parameter can *force* the compiler to create a specialization of `do_it` for every specialization of `std::integral_constant` provided. In rare situations (specifically, when each specialization is infrequently called, avoiding caching issues), this method improves performance. – hegel5000 Aug 06 '19 at 17:29
  • Specifically, this is useful if there is a large function with several runtime arguments, but where one argument might additionally need to be specialized for a small handful of compile-time values. – hegel5000 Aug 06 '19 at 17:32
1

Use constexpr (needs at least C++14 to allow for):

constexpr double do_it(int m)
{
   double result = 0;

   for(int i = 0; i < m; i++)
      result += i;

   return result;
}

constexpr double it_result = do_it(10) + 1;  // compile time `do_it`, possibly runtime `+ 1`

int main() {
    int x;
    cin >> x;
    do_it(x);  // runtime
}

If you want to force a constexpr value to be inlined as part of a runtime expression, you can use the FORCE_CT_EVAL macro from this comment:

#include <utility>
#define FORCE_CT_EVAL(func) [](){constexpr auto ___expr = func; return std::move(___expr);}()

double it_result = FORCE_CT_EVAL(do_it(10));  // compile time
Eric
  • 95,302
  • 53
  • 242
  • 374
  • 1
    have you read the comment thread? People have said that `constexpr` is not enough to force the compiler to create a specialization – phuclv Aug 07 '19 at 16:18
  • @phuclv: If the return value is knowable at compile-time, a specialization is unnecessary - better to inline the return value. Updated with a macro that forces the compiler to evaluate an arbitrary constexpr. – Eric Aug 12 '19 at 17:57