g++ c++11 constexpr evaluation performance

Question

g++ (4.7.2) and similar versions seem to evaluate constexpr surprisingly fast during compile-time. On my machines in fact much faster than the compiled program during runtime.

Is there a reasonable explanation for that behavior? Are there optimization techniques involved which are only applicable at compile-time, that can be executed quicker than actual compiled code? If so, which?

Here`s my test program and the observed results.

#include <iostream>

constexpr int mc91(int n)
 {

     return (n > 100)? n-10 : mc91(mc91(n+11));

 }

constexpr double foo(double n)
{
   return (n>2)? (0.9999)*((unsigned int)(foo(n-1)+foo(n-2))%100):1;
}

constexpr unsigned ack( unsigned m, unsigned n )
{
    return m == 0
        ? n + 1
        : n == 0
        ? ack( m - 1, 1 )
        : ack( m - 1, ack( m, n - 1 ) );
}

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

int main(void)
{
   constexpr unsigned int compiletime_ack=ack(3,14);
   constexpr int compiletime_91=slow91(49);
   static_assert( compiletime_ack == 131069, "Must be evaluated at compile-time" );
   static_assert( compiletime_91  == 91,     "Must be evaluated at compile-time" );
   std::cout << compiletime_ack << std::endl;
   std::cout << compiletime_91  << std::endl;
   std::cout << ack(3,14) << std::endl;
   std::cout << slow91(49) << std::endl;
   return 0;
}

compiletime:

time g++ constexpr.cpp -std=c++11 -fconstexpr-depth=10000000 -O3 

real    0m0.645s
user    0m0.600s
sys     0m0.032s

runtime:

time ./a.out 

131069
91
131069
91

real    0m43.708s
user    0m43.567s
sys     0m0.008s

Here mc91 is the usual mac carthy f91 (as can be found on wikipedia) and foo is just a useless function returning real values between about 1 and 100, with a fib runtime complexity.

Both the slow calculation of 91 and the ackermann functions get evaluated with the same arguments by the compiler and the compiled program.

Surprisingly the program would even run faster, just generating code and running it through the compiler than executing the code itself.

How do you know the compiler calculates any of these expressions? — n. m. could be an AI, Apr 25 '13 at 18:27
Same issue during [this](http://stackoverflow.com/a/7570551/952747) benchmark. Read last paragraph. — masoud, Apr 25 '13 at 18:31
Edit: Added static_assert as suggested by Drew Dormann to make the example more convincing (does not change results.) — i_want_to_learn, Apr 25 '13 at 18:45

Drew Dormann · Answer 1 · 2013-04-25T20:07:33.237

14

At compile-time, redundant (identical) constexpr calls can be memoized, while run-time recursive behavior does not provide this.

If you change every recursive function such as...

constexpr unsigned slow91(int n) {
   return mc91(mc91(foo(n))%100);
}

... to a form that isn't constexpr, but does remember past calculations at runtime:

std::unordered_map< int, boost::optional<unsigned> > results4;
//     parameter(s) ^^^           result ^^^^^^^^

unsigned slow91(int n) {
     boost::optional<unsigned> &ret = results4[n];
     if ( !ret )
     {
         ret = mc91(mc91(foo(n))%100);
     }
     return *ret;
}

You will get less surprising results.

compiletime:

time g++ test.cpp -std=c++11 -O3

real    0m1.708s
user    0m1.496s
sys     0m0.176s

runtime:

time ./a.out

131069
91
131069
91

real    0m0.097s
user    0m0.064s
sys     0m0.032s

edited Apr 25 '13 at 20:07

answered Apr 25 '13 at 18:29

Drew Dormann

59,987
13
123
180

Assigning to a `constexpr` variable is one of those times when runtime evaluation is not permitted. – bames53 Apr 25 '13 at 18:52
@bames53 Yes, I'm wondering if gcc is skirting the standard in this case. I suspect that testing the two lines in this answer would tell. – Drew Dormann Apr 25 '13 at 18:56
see my answer, the values are really evaluated compile-time (as proven my assembler output).. and sure, a *static_assert* would prove the same thing. – Filip Roséen - refp Apr 25 '13 at 19:13
1

Yeah evaluating an expression using the same arguments once (memoization) was my theory in the first place, thus the real numbered fibonacci like foo function for which this technique is not reasonably applicable. Regarding the recursion depth: thats right we are out of standard here, but using such large values better demonstrates the observed effect. – i_want_to_learn Apr 25 '13 at 19:47
Also since simple techniques like memoization can be ruled out for real valued functions, which still get evaluated faster at compile time, this class of problems might pose a point in favour of interpreted languages for which source level transformations are possible at runtime? – i_want_to_learn Apr 25 '13 at 20:28

Filip Roséen - refp · Answer 2 · 2013-04-28T11:29:07.330

Memoization

This is a very interesting "discovery" but the answer is probably more simple than you think it is.

Something can be evaluated compile-time when declared constexpr if all values involved are known at compile time (and if the variable where the value is supposed to end up is declared constexpr as well) with that said imagine the following pseudo-code:

f(x)   = g(x)
g(x)   = x + h(x,x)
h(x,y) = x + y

since every value is known at compile time the compiler can rewrite the above into the, equivalent, below:

f(x) = x + x + x

To put it in words every function call has been removed and replaced with that of the expression itself. What is also applicable is a method called memoization where results of passed calculated expresions are stored away so you only need to do the hard work once.

If you know that g(5) = 15 why calculate it again? instead just replace g(5) with 15 everytime it is needed, This is possible since a function declared as constexpr isn't allowed to have side-effects .

Runtime

In runtime this is not happening (since we didn't tell the code to behave this way). The little guy running through your code will need to jump from f to g to h and then jump back to g from h before it jumps from g to f all while he stores the return value of each function and passing it along to the next one.

Even if this guy is very very tiny and that he doesn't need to jump very very far he still doesn't like jumping back and forth all the time, it takes a lot for him to do this and with that; it takes time.

But in the OPs example, is it really calculated compile-time?

Yes, and to those not believing that the compiler actually calculates this and put it as constants in the finished binary I will supply the relevant assembly instructions from OPs code below (output of g++ -S -Wall -pedantic -fconstexpr-depth=1000000 -std=c++11)

main:
.LFB1200:
  .cfi_startproc
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset 6, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register 6
  subq  $16, %rsp
  movl  $131069, -4(%rbp)
  movl  $91, -8(%rbp)
  movl  $131069, %esi               # one of the values from constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEj
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi
  call  _ZNSolsEPFRSoS_E
  movl  $91, %esi                   # the other value from our constexpr
  movl  $_ZSt4cout, %edi
  call  _ZNSolsEi
  movl  $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi
  movq  %rax, %rdi

  # ...
  # a lot of jumping is taking place down here
  # see the full output at http://codepad.org/Q8D7c41y

g++ c++11 constexpr evaluation performance

2 Answers2

Memoization

Runtime

But in the OPs example, is it really calculated compile-time?

Linked