How can the compile-time be (exponentially) faster than run-time?

Question

The below code calculates Fibonacci numbers by an exponentially slow algorithm:

#include <cstdlib>
#include <iostream>

#define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }

constexpr auto fib(const size_t n) -> long long
{
    return n < 2 ? 1: fib(n - 1) + fib(n - 2);
}

int main(int argc, char *argv[])
{
    const long long fib91 = fib(91);

    DEBUG( fib91 );
    DEBUG( fib(45) );

    return EXIT_SUCCESS;
}

And I am calculating the 45th Fibonacci number at run-time, and the 91st one at compile time.

The interesting fact is that GCC 4.9 compiles the code and computes fib91 in a fraction of a second, but it takes a while to spit out fib(45).

My question: If GCC is smart enough to optimize fib(91) computation and not to take the exponentially slow path, what stops it to do the same for fib(45)?

Does the above mean GCC produces two compiled versions of fib function where one is fast and the other exponentially slow?

The question is not how the compiler optimizes fib(91) calculation (yes! It does use a sort of memoization), but if it knows how to optimize the fib function, why does it not do the same for fib(45)? And, are there two separate compilations of the fib function? One slow, and the other fast?

this questions requires to know at least the flags used to compile the program — Jack, Jul 11 '14 at 23:43
About the first part, `const long long fib91 = fib(91);` is computed at compile time, the whole function is replaced by the actual value, not sure why it doesn't do it for the fib45 — OneOfOne, Jul 11 '14 at 23:44
Probably `45` exceeds recursion limit. play with `-fconstexpr-depth=Max_Recursion_Limit` to see if this limit is exceeded. — 101010, Jul 11 '14 at 23:44
@OneOfOne: that was the obvious answer, I was curious to know the optimization flags since it's strange both don't get optimized. — Jack, Jul 11 '14 at 23:45
Have you actually checked if the value for fib91 gets computed at compile time? Might be a gcc bug. — Cubic, Jul 11 '14 at 23:45
For `constexpr`, the compiler probably generates _two_ versions. One for constexpr as if it were `template long long fib();`, and one as a normal function as you typed. The `template` one is coincidentally memoized via the template machinery, but that machinery doesn't exist for regular functions. (Although since 45 is hardcoded, there's no reason it couldn't do that at compile time too) — Mooing Duck, Jul 12 '14 at 00:13
Similar Question has been discussed in this SO article: http://stackoverflow.com/q/22645551/2724703 — Mantosh Kumar, Jul 12 '14 at 02:51
@Potatoswatter that's about as useful as doing `#define BITWISE_AND_OPERATOR &` — M.M, Jul 12 '14 at 06:05
@MattMcNabb Well, it's a standard facility, and there's no other equally explicit alternative. Not saying anything is alright, but not explicit. The literal `0` is the worst kind of magic number. That said, I do prefer `&` to its "expressive" equivalent `bitand`. — Potatoswatter, Jul 12 '14 at 07:07
falling of the end of main is the time-honoured and standard supported way of denoting successful completion :P — sp2danny, Jul 12 '14 at 07:28
FYI, here's a version of the patch (with some description) https://gcc.gnu.org/ml/gcc-patches/2009-11/msg01504.html — Steve Jessop, Jul 12 '14 at 12:39

Timothy Shields · Answer 1 · 2014-07-12T15:12:02.493

GCC is likely memoizing constexpr functions (enabling a Θ(n) computation of fib(n)). That is safe for the compiler to do because constexpr functions are purely functional.

Compare the Θ(n) "compiler algorithm" (using memoization) to your Θ(φⁿ) run time algorithm (where φ is the golden ratio) and suddenly it makes perfect sense that the compiler is so much faster.

From the constexpr page on cppreference (emphasis added):

The constexpr specifier declares that it is possible to evaluate the value of the function or variable at compile time.

The constexpr specifier does not declare that it is required to evaluate the value of the function or variable at compile time. So one can only guess what heuristics GCC is using to choose whether to evaluate at compile time or run time when a compile time computation is not required by language rules. It can choose either, on a case-by-case basis, and still be correct.

If you want to force the compiler to evaluate your constexpr function at compile time, here's a simple trick that will do it.

constexpr auto compute_fib(const size_t n) -> long long
{
    return n < 2 ? n : compute_fib(n - 1) + compute_fib(n - 2);
}

template <std::size_t N>
struct fib
{
    static_assert(N >= 0, "N must be nonnegative.");
    static const long long value = compute_fib(N);
};

In the rest of your code you can then access fib<45>::value or fib<91>::value with the guarantee that they'll be evaluated at compile time.

I know how the compiler changes it to an `O(n)` algorithm. My question is that why it does not do the same for `fib(45)`? i.e. is it compiling two versions of `fib` function? — behzad.nouri, Jul 11 '14 at 23:51
@behzad.nouri The semantics of `constexpr` applied to a function mean that the function return value *may* be treated as a compile-time constant. They don't *require* it be treated as a compile-time constant. I updated the answer with that extra information. — Timothy Shields, Jul 11 '14 at 23:56
I guess that the recursive calculation of fib(n) would take approximately O(1.62^n) time. — Maarten Hilferink, Jul 12 '14 at 00:50
@MaartenHilferink You are absolutely correct. In my haste I wrote Θ(2^n), but the base should be the golden ratio φ instead of 2. This is because the complexity is actually Θ(fib(n)) where fib(n) is just Θ(φ^n). — Timothy Shields, Jul 12 '14 at 06:13

score 16 · Answer 2 · answered Jul 11 '14 at 23:46

16

At compile-time the compiler can memoize the result of the function. This is safe, because the function is a constexpr and hence will always return the same result of the same inputs.

At run-time it could in theory do the same. However most C++ programmers would frown at optimization passes that result in hidden memory allocations.

answered Jul 11 '14 at 23:46

pdw

8,359
2
29
41

3

There should totally be a `runtime_memoize` attribute for `constexpr` functions. In the case of `fib`, you know pretty well that it will only ever be called for a small number of inputs, and therefore the cache will never be large. But quite aside from being afraid of frowns from programmers, I expect the compiler-writers are afraid of not knowing how big the cache will need to grow. – Steve Jessop Jul 12 '14 at 12:35
1

... naturally in using it the programmer would also have to accept some behaviour along the lines of aborting on failure to allocate memory, which not everyone wants. – Steve Jessop Jul 12 '14 at 12:42
2

@SteveJessop: The problem is that the compiler don't know what the runtime inputs will be, so it cannot allocate (enough) memory upfront. It would be required to have something like `std::vector>` under the convers, which then runs into exception safety concerns during resizing.... Better to make that all explicit by making the programmer write it. (now if all inputs have small known ranges, _that_ could be possible) – Mooing Duck Jul 12 '14 at 20:12

score 2 · Answer 3 · answered Jul 12 '14 at 11:23

When you ask for fib(91) to give a value to your const fib91 in the source code, the compiler is forced to compute that value from you const expr. It does not compile the function (as you seem to think), just it sees that to compute fib91 it needs fib(90) and fib(89), to compute the it needs fib(87)... so on until he computes fib(1) which is given. This is an $O(n)$ algorithm and the result is computed fast enough.

However when you ask to evaluate fib(45) in runtime the compiler has to choose wether using the actual function call or precompute the result. Eventually it decides to use the compiled function. Now, the compiled function must execute exactly the exponential algorithm that you have decided there is no way the compiler could implement memoization to optimize a recursive function (think about the need to allocate some cache and to understand how many values to keep and how to manage them between function calls).

I am confused by this answer. At compile time it must calculate the necessary values less than `fib(91)` to assign the value `fib91=fib(91)`. But. the compiler apparently calculated this in a very short time. Then, at run-time, it must do this again to get the value of `fib45=fib(45)`, but there are considerable fewer calculations to find `fib45=fib(45)` compared to finding `fib91=fib(91)`, so `fib45=fib(45)` should be faster, but according to the question, it was considerable slower. Asside from the possibility that behzad was wrong about the timing, what explains the time difference. — Kevin Fegan, Jul 25 '14 at 22:52

How can the compile-time be (exponentially) faster than run-time?

3 Answers3

Linked