2

I'm using C++ constexpr to evaluate, at compile time, the largest term of the calculated Fibonacci sequence that this code can correctly evaluate. To do this, I use this code:

constexpr unsigned long long fibo(unsigned n) {
    return (n < 2) ? 1 : fibo(n-1)+fibo(n-2);
}

constexpr unsigned max_fibo(unsigned n) {
    return (fibo(n+1) >= fibo(n)) ? max_fibo(n+1) : n;
}

const unsigned MAX_FIBO_TERM = max_fibo(0);

This works just fine and quickly constructs the correct value for MAX_FIBO_TERM (92 on my machine). However, using that fibo function to actually calculate terms at runtime is extremely slow (it takes almost 90 seconds to calculate only the first 48 terms).

I can use an alternative function to calculate the terms which is much faster:

unsigned long long fiboiter(unsigned n, unsigned long long &prev, 
                            unsigned long long &prevprev) {
    return prev = (n < 2) ? prevprev = 1 : 
        (std::swap(prev, prevprev), prev + prevprev);
}

However, I can't seem to make this into a valid constexpr function. (This is why it's written in that strange way.) When I run this version, it takes 0.005 seconds on my machine.

When I compile with g++ -O2 -std=c++11 fibo.cpp -o fibo it only takes 0.302 seconds.

My question is: How can I use the same efficient function both at runtime and compile time?

I'm using g++ 4.8.3 on a 64-bit Linux machine, but what I would like is a cross-platform universal answer rather than simply one specific to this compiler and machine. And no, C++14 is not yet an option in this situation.

Also, my question is similar to, but not the same as this question because I'm asking about how to use the same runtime-efficent code at compile time and not why or how the compile-time code is made efficient.

Here's the full program.

fibo.cpp

#include <iomanip>
#include <iostream>
#include <locale>

constexpr unsigned long long fibo(unsigned n) {
    return (n < 2) ? 1 : fibo(n-1)+fibo(n-2);
}   
unsigned long long fiboiter(unsigned n, unsigned long long &prev, 
                            unsigned long long &prevprev) {
    return prev = (n < 2) ? prevprev = 1 : 
        (std::swap(prev, prevprev), prev + prevprev);
}    
constexpr unsigned max_fibo(unsigned n) {
    return (fibo(n+1) >= fibo(n)) ? max_fibo(n+1) : n;
}    
const unsigned MAX_FIBO_TERM = max_fibo(0);

int main(int argc, char **)
{
    // format numeric output according to user's preferred locale
    std::cout.imbue(std::locale{""});
    unsigned long long prev, prevprev;

    if (argc > 1) {
        for (unsigned i = 0; i <= MAX_FIBO_TERM; ++i) 
            std::cout << "fib(" << std::setw(2) << i << ") = " 
                      << std::setw(26) << fiboiter(i, prev, prevprev) << '\n';
    } else {
        for (unsigned i = 0; i <= MAX_FIBO_TERM; ++i) 
            std::cout << "fib(" << std::setw(2) << i << ") = " 
                      << std::setw(26) << fibo(i) << '\n';
    }
}

Sample output

fib( 0) =                          1
fib( 1) =                          1
fib( 2) =                          2
fib( 3) =                          3
fib( 4) =                          5
fib( 5) =                          8
 ...
fib(89) =  2,880,067,194,370,816,120
fib(90) =  4,660,046,610,375,530,309
fib(91) =  7,540,113,804,746,346,429
fib(92) = 12,200,160,415,121,876,738
Community
  • 1
  • 1
Edward
  • 6,964
  • 2
  • 29
  • 55

2 Answers2

8

Good compilers optimize tail recursion well, try a tail-recursive constexpr implementation:

constexpr unsigned long long fibo(unsigned n,
                                  unsigned long long prev = 1,
                                  unsigned long long prevprev = 0) {
    return n == 0 ? prev : fibo(n - 1, prev + prevprev, prev);
}   

According to the clang on Coliru, both constexpr and non-constexpr versions take 5-6 milliseconds to complete (and that's probably I/O dominated).

Casey
  • 41,449
  • 7
  • 95
  • 125
1
template<size_t max, class F, size_t...is, class R=std::result_of_t<F(unsigned)>>
R memoize(std::index_sequence<is...>,unsigned val){
  static R cache[]={ F{}(is)...};
  if (val<max)return cache[val];
  return F{}(val);
}
template<size_t max, class F, class R=std::result_of_t<F(unsigned)>>
R memoize(unsigned val){
  return memoize<max,F>( std::make_index_sequence<max>{}, val );
}
struct fib_t{
  constexpr auto operator()(unsigned val)const{
    return fibo(val);
  }
  constexpr fib_t() {}
};

then memoize<100, fib_t>(x) will be fast at runtime up to 100.

A faster fibo will make it faster post 100.

While index_sequence is C++14, here is a replacement that should work for up to a few 100 elements:

template<std::size_t...>struct indexes{using type=indexes;};
template<std::size_t M, std::size_t...Is>struct make_indexes_t:make_indexes_t<M-1,M-1,Is...>{};
template<std::size_t...Is>struct make_indexes_t<0,Is...>:indexes<Is...>{};
template<std::size_t M>using make_indexes=typename make_indexes_t<M>::type;

and replace std::make_index_sequence with make_indexes, and std::index_sequence with indexes.

If you need answers past a few 100, you'll need a more robust make_indexes than the above. There are many implementations in the wild, they mainly consist of divide-and-conquer (write concatenate, and write a make_indexes that goes from min to max via recursively splitting the range and concatenating. This gives logarithmic template recursion depth).

std::result_of_t is C++14, but template<class Sig>using result_of_t=typename std::result_of<Sig>::type is a drop-in replacement.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • Thanks, this is interesting, but aren't `result_of` and `index_sequence` C++14 constructs? – Edward Nov 04 '14 at 00:39
  • @edward `result_of` is C++11. `index_sequence` is C++14, but easy to generate a "good enough" implemention in C++11. [Here](http://loungecpp.wikidot.com/tips-and-tricks%3aindices) are [two](http://codereview.stackexchange.com/questions/39648/improved-indices-trick) (with slightly different names for the types, as they predate C++14). The first one is simple and "good enough", the second is required if you have on the order of 1000 indexes to reduce template recursion depth. I'll add a simple one to the answer above. – Yakk - Adam Nevraumont Nov 04 '14 at 14:20
  • @edward typo fixed, and `result_of_t` implementation added. – Yakk - Adam Nevraumont Nov 04 '14 at 14:42