c++11 fast constexpr integer powers

Question

Beating the dead horse here. A typical (and fast) way of doing integer powers in C is this classic:

int64_t ipow(int64_t base, int exp){
  int64_t result = 1;
  while(exp){
    if(exp & 1)
      result *= base;
    exp >>= 1;
    base *= base;
  }
  return result;
}

However I needed a compile time integer power so I went ahead and made a recursive implementation using constexpr:

constexpr int64_t ipow_(int base, int exp){
  return exp > 1 ? ipow_(base, (exp>>1) + (exp&1)) * ipow_(base, exp>>1) : base;
}
constexpr int64_t ipow(int base, int exp){
  return exp < 1 ? 1 : ipow_(base, exp);
}

The second function is only to handle exponents less than 1 in a predictable way. Passing exp<0 is an error in this case.

The recursive version is 4 times slower

I generate a vector of 10E6 random valued bases and exponents in the range [0,15] and time both algorithms on the vector (after doing a non-timed run to try to remove any caching effects). Without optimization the recursice method is twice as fast as the loop. But with -O3 (GCC) the loop is 4 times faster than the recursice method.

My question to you guys is this: Can any one come up with a faster ipow() function that handles exponent and bases of 0 and can be used as a constexpr?

(Disclaimer: I don't need a faster ipow, I'm just interested to see what the smart people here can come up with).

`constexpr` functions may be evaluated at runtime too. The wanted optimization is for the case where runtime evaluation is performed. If the recursive method is comparably fast to the loop in runtime, the recursive method may be used for both compile time and runtime without needing a separate definition for a runtime version. See @hivert's answer. — Emily L., Jul 18 '13 at 12:16
@MikeDunlavey She did write a small program to do it: that `ipow` function. ;) — Casey, Jul 18 '13 at 16:05
FYI, C++1y SUBSTANTIALLY expands what you can do in a `constexpr` function: [see N3652 for details](http://isocpp.org/files/papers/N3652.html). Your first implementation is a valid `constexpr` function in C++1y. — Casey, Jul 18 '13 at 16:15
@MikeDunlavey: As I said in OP, this question is purely for theoretical/nerd interest, I'm not in the *need* of a faster code. You're not being very constructive. — Emily L., Jul 19 '13 at 08:13
The rules for what is allowed in a `constexpr` function have been relaxed in the C++1y CD so that you can do almost anything in a `constexpr` function. To my knowledge no released compiler implements those rules yet (although IIRC Richard Smith - the author of the paper for that change - has already implemented it on an experimental Clang branch). — Casey, Jul 19 '13 at 14:24
Revisiting this question: now that clang 3.4 has been released with c++1y support, [the original function from the OP works perfectly at compile time by simply adding the `constexpr` keyword to its declaration](http://coliru.stacked-crooked.com/a/5f0d0fa84afb29df). — Casey, Feb 04 '14 at 21:11

Casey · Accepted Answer · 2013-07-18T16:26:58.113

A good optimizing compiler will transform tail-recursive functions to run as fast as imperative code. You can transform this function to be tail recursive with pumping. GCC 4.8.1 compiles this test program:

#include <cstdint>

constexpr int64_t ipow(int64_t base, int exp, int64_t result = 1) {
  return exp < 1 ? result : ipow(base*base, exp/2, (exp % 2) ? result*base : result);
}

int64_t foo(int64_t base, int exp) {
  return ipow(base, exp);
}

into a loop (See this at gcc.godbolt.org):

foo(long, int):
    testl   %esi, %esi
    movl    $1, %eax
    jle .L4
.L3:
    movq    %rax, %rdx
    imulq   %rdi, %rdx
    testb   $1, %sil
    cmovne  %rdx, %rax
    imulq   %rdi, %rdi
    sarl    %esi
    jne .L3
    rep; ret
.L4:
    rep; ret

vs. your while loop implementation:

ipow(long, int):
    testl   %esi, %esi
    movl    $1, %eax
    je  .L4
.L3:
    movq    %rax, %rdx
    imulq   %rdi, %rdx
    testb   $1, %sil
    cmovne  %rdx, %rax
    imulq   %rdi, %rdi
    sarl    %esi
    jne .L3
    rep; ret
.L4:
    rep; ret

Instruction-by-instruction identical is good enough for me.

Interesting! I knew tail-call optimization could be transformed into iteration by the compiler but I wasn't expecting instruction-by-instruction identity with the actual loop. Also I probably need to look into the conditions that are necessary for tail-call optimization to occur. Because I can't tell *why* it applies in your `ipow` but not in mine. — Emily L., Jul 19 '13 at 08:18
Tail calls return the result of a function call without performing any additional computation with that result. So, e.g, `return function(x+2)'` is a tail call, but `return 2+function(x);` is not. — Casey, Jul 19 '13 at 14:18

score 3 · Answer 2 · edited Jul 18 '13 at 16:44

3

It seems that this is a standard problem with constexpr and template programming in C++. Due to compile time constraints, the constexpr version is slower than a normal version if executed at runtime. But overloading doesn't allows to chose the correct version. The standardization committee is working on this issue. See for example the following working document http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3583.pdf

edited Jul 18 '13 at 16:44

Ben Voigt

277,958
43
419
720

answered Jul 18 '13 at 09:45

hivert

10,579
3
31
56

1

Thanks for the correction ! In French (my native language) it is spelled "comité". So it was very unnatural for me to have two "m", "t" and "e" and my spell checker was ok ;-) – hivert Jul 18 '13 at 19:59
Discrimination between runtime and compile time is not necessary to solve the problem of selecting between equivalent implementations. C++14 relaxes the requirements of constexpr functions to allow most runtime algorithms. – Potatoswatter Mar 09 '14 at 12:00
Wait, I thought it was comédie? But seriously, the main problem is no so much the limitation to what a `constexpr` may do, but the unhealthy combination of this limitation _and_ allowing the compiler to evaluate the function at runtime without the programmer being able to do much about it. Even if you force compiletime-evaluation by assigning to an `enum`, the compiler is still allowed to evaluate a subsequent call at runtime _again_ (e.g. GCC will just do that!). This is dumb, but perfectly legal. – Damon Mar 09 '14 at 14:17

c++11 fast constexpr integer powers

The recursive version is 4 times slower

2 Answers2

Linked