8

I wrote the program Fibonacci number calculation in compile time (constexpr) problem using the template metaprogramming techniques supported in C++11. The purpose of this is to calculate the difference in the run-time between the template metaprogramming approach and the old conventional approach.

// Template Metaprograming Approach
template<int  N>
constexpr int fibonacci() {return fibonacci<N-1>() + fibonacci<N-2>(); }
template<>
constexpr int fibonacci<1>() { return 1; }
template<>
constexpr int fibonacci<0>() { return 0; }



// Conventional Approach
 int fibonacci(int N) {
   if ( N == 0 ) return 0;
   else if ( N == 1 ) return 1;
   else
      return (fibonacci(N-1) + fibonacci(N-2));
} 

I ran both programs for N = 40 on my GNU/Linux system and measured the time and found that that conventional solution (1.15 second) is around two times slower than the template-based solution (0.55 second). This is a significant improvement as both approaches are based on the recursion.

To understand it more I compiled the program (-fdump-tree-all flag) in g++ and found that compiler actually generated the 40 different functions (like fibonacci<40>, fibonacci<39>...fibonacci<0>).

constexpr int fibonacci() [with int N = 40] () {
  int D.29948, D.29949, D.29950;
  D.29949 = fibonacci<39> ();
  D.29950 = fibonacci<38> ();
  D.29948 = D.29949 + D.29950;
  return D.29948;
}

constexpr int fibonacci() [with int N = 39] () {
  int D.29952, D.29953, D.29954;
  D.29953 = fibonacci<38> ();
  D.29954 = fibonacci<37> ();
  D.29952 = D.29953 + D.29954;
  return D.29952;
}
...
...
...
constexpr int fibonacci() [with int N = 0] () {
  int D.29962;
  D.29962 = 0;
  return D.29962;
}

I also debugged the program in GDB and found that all the above functions are executed an equal number of times as with the conventional recursive approach. If both versions of the program are executing the function an equal number of times (recursive), then how is this achieved by template metaprogramming techniques? I would also like to know your opinion about how and why a template metaprogramming based approach is taking half time compared to the other version? Can this program be made faster than the current one?

Basically my intention here is to understand what's going on internally as much as possible.

My machine is GNU/Linux with GCC 4.8.1, and I used the optimization -o3 for both programs.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Mantosh Kumar
  • 5,659
  • 3
  • 24
  • 48
  • 1
    Don't use simple `f(n) = f(n - 1) + f(n - 2)` formula to calculate Fibonacci numbers. – Constructor Mar 25 '14 at 20:30
  • 1
    Are the times you reported from executing debug code or non-debug code? – R Sahu Mar 25 '14 at 20:40
  • 1
    @Constructor which method do you prefer? I personally like raising a {1,1 \n 1, 0} matrix to power n-1, and taking the top-left variable for the answer. I found it takes to perform extremely fast, if done properly. You can also add the matrix-to-2^k hack for even more speed. Anyway, unto the question, it's generally a bad idea to template-of-template, too much space. –  Mar 25 '14 at 20:42
  • @Shingetsu Yes, it is the fastest solution (with logarithmic time complexity). But I meant that simple formula is not appropriate due to numerous and meaningless recursive calls. – Constructor Mar 25 '14 at 20:57

4 Answers4

13

Try this:

template<size_t N>
struct fibonacci : integral_constant<size_t, fibonacci<N-1>{} + fibonacci<N-2>{}> {};

template<> struct fibonacci<1> : integral_constant<size_t,1> {};
template<> struct fibonacci<0> : integral_constant<size_t,0> {};

With clang and -Os, this compiles in roughly 0.5s and runs in zero time for N=40. Your "conventional" approach compiles in roughly 0.4s and runs in 0.8s. Just for checking, the result is 102334155 right?

When I tried your own constexpr solution the compiler run for a couple of minutes and then I stopped it because apparently memory was full (computer started freezing). The compiler was trying to compute the final result and your implementation is extremely inefficient to be used at compile time.

With this solution, template instantiations at N-2, N-1 are re-used when instantiating N. So fibonacci<40> is actually known at compile time as a value, and there is nothing to do at run-time. This is a dynamic programming approach and of course you can do the same at run time if you store all values at 0 through N-1 before computing at N.

With your solution, the compiler can evaluate fibonacci<N>() at compile time but is not required to. In your case, all or part of computation is left for run time. In my case, all computation is attempted at compile time, hence never ending.

iavr
  • 7,547
  • 1
  • 18
  • 53
  • I think it can be done shorter using the code from the question: `template struct Fibonacci : std::integral_constant()> {}; – Constructor Mar 25 '14 at 20:59
  • @Constructor Well, this is meant as an alternative implementation. As I said, the `constexpr` solution of the question never managed to compile for `N=40`. It appears template instantiations are memoized, while results of `constexpr` function calls are not. – iavr Mar 25 '14 at 21:01
  • I think that there is a problem (possibly a bug) in clang. Compare: [gcc `N = 40`](http://rextester.com/HAREZS6005), [clang `N = 26`](http://rextester.com/JAGA61120) and [clang `N = 27`](http://rextester.com/LAA38843) (the result for clang is same for all `N > 26`). – Constructor Mar 25 '14 at 21:14
  • Not really, for me at least. `121393`, `196418`, `102334155` for `N=26,27,40` for both gcc and clang. – iavr Mar 25 '14 at 21:20
  • It is strange. I think that the maximum step of constexpr evaluation mentioned by clang (`N = 27` in my example) should be specified in the its setting. – Constructor Mar 25 '14 at 21:44
  • These results are with my solution, **not** a `constexpr` function. – iavr Mar 25 '14 at 22:01
  • @iavr: Yes you are right and I also ran your solution and it was zero time. Now I wanted to see where 102334155 is being calculated by compiler. Do you have idea about which internal files would contain that information. In my case I had checked the *.gimple file using(fdump-tree-all) and saw how template is being instantiated. – Mantosh Kumar Mar 26 '14 at 02:30
  • @tmp No I don't know, if you find out please let me know as well :-) – iavr Mar 26 '14 at 02:34
  • @iavr: Ok sure. To explore it I have asked one new question http://stackoverflow.com/q/22650761/2724703 . You may want to track this question to understand it. – Mantosh Kumar Mar 26 '14 at 03:11
5

The reason is that your runtime solution is not optimal. For every fib number, functions are called several times. The fibonacci sequence, has overlapping subproblems, so for example fib(6) calls fib(4), and fib(5) also calls fib(4).

The template based approach, uses (inadvertently) a Dynamic Programming approach, meaning that it stores values for previously calculated numbers, avoiding repetition. So, when fib(5) calls fib(4), the number was already calculated when fib(6) did.

I recommend looking up "dynamic programming fibonacci" and trying that, it should speed things up dramatically.

imreal
  • 10,178
  • 2
  • 32
  • 48
  • 1
    Thanks for your explanation and suggestion.But I had mentioned in my question that in both cases number of function calls were same so you still think that template based uses dynamic programming approach. Yes I am aware about the dynamic programming approach would speed up things but I wanted to check the difference in two by using recursive(slower) approach. – Mantosh Kumar Mar 26 '14 at 02:12
  • 1
    @tmp are you sure it has the same number of calls? It doesnt make sense that a template with a given parameter set gets instantiated more than once. It is not even a function call to begin with. – imreal Mar 26 '14 at 02:22
0

Adding -O1 (or higher) to GCC4.8.1 will make fibonacci<40>() a compile time constant and all the template generated code will disappear from your assembly. The following code

int foo()
{
  return fibonacci<40>();
}

will result in the assembly output

foo():
    movl    $102334155, %eax
    ret

This gives the best runtime performance.

However, it looks like you are building without optimizations (-O0) so you get something quite a bit different. The assembly output for each of the 40 fibonacci functions look basically identical (except for the 0 and 1 cases)

int fibonacci<40>():
    pushq   %rbp
    movq    %rsp, %rbp
    pushq   %rbx
    subq    $8, %rsp
    call    int fibonacci<39>()
    movl    %eax, %ebx
    call    int fibonacci<38>()
    addl    %ebx, %eax
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    ret

This is straight forward, it sets up the stack, calls the two other fibonacci functions, adds the value, tears down the stack, and returns. No branching, and no comparisons.

Now compare that with the assembly from the conventional approach

fibonacci(int):
    pushq   %rbp
    pushq   %rbx
    subq    $8, %rsp
    movl    %edi, %ebx
    movl    $0, %eax
    testl   %edi, %edi
    je  .L2
    movb    $1, %al
    cmpl    $1, %edi
    je  .L2
    leal    -1(%rdi), %edi
    call    fibonacci(int)
    movl    %eax, %ebp
    leal    -2(%rbx), %edi
    call    fibonacci(int)
    addl    %ebp, %eax
    .L2:
    addq    $8, %rsp
    popq    %rbx
    popq    %rbp
    ret

Each time the function is called it needs to do check if N is 0 or 1 and act appropriately. This comparison is not needed in the template version because it is built into the function via the magic of templates. My guess is that the un-optimized version of the template code is faster because you avoid those comparisons and would also not have any missed branch predictions.

Rastaban
  • 881
  • 6
  • 8
  • 1
    I did tried all the optimization flag(o1 o2 o3) and did not find any such thing on my terminal. Did you see it in your case which you have explained or it is just your assumption. – Mantosh Kumar Mar 26 '14 at 02:00
  • 2
    I verified it using http://gcc.godbolt.org/ and all of the assembly output I show is a cut and paste from there. I made no assumptions. – Rastaban Mar 26 '14 at 17:26
  • I don't believe this is making use of c++ constexpr, simply the compiler optimizer doing something which is good, but not guaranteed. – xaxxon Apr 30 '17 at 09:36
-1

Maybe just use a more efficient algorithm?

constexpr pair<double, double> helper(size_t n, const pair<double, double>& g)
{
    return n % 2
        ? make_pair(g.second * g.second + g.first * g.first, g.second * g.second + 2 * g.first * g.second)
        : make_pair(2 * g.first * g.second - g.first * g.first, g.second * g.second + g.first * g.first);
}

constexpr pair<double, double> fibonacciRecursive(size_t n)
{
    return n < 2
        ? make_pair<double, double>(n, 1)
        : helper(n, fibonacciRecursive(n / 2));
}

constexpr double fibonacci(size_t n)
{
    return fibonacciRecursive(n).first;
}

My code is based on an idea described by D. Knuth in the first part of his "The Art of Computer Programming". I can't remember the exact place in this book, but I'm sure that the algorithm was described there.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
alexeibs
  • 535
  • 3
  • 7
  • Why do you use `double`? – Constructor Mar 25 '14 at 21:48
  • Numbers in Fibonacci sequence grow very fast so we need to prevent overflowing. – alexeibs Mar 25 '14 at 21:50
  • Not so fast to cause an overflow of `int` for `N = 40`. – Constructor Mar 25 '14 at 21:55
  • How do you measure speed? [Seems to me it doesn't work slowly](http://ideone.com/eXKd0t) – alexeibs Mar 25 '14 at 22:01
  • What speed do you mean? I meant that `f(40) < std::numeric_limits::max()`. – Constructor Mar 26 '14 at 08:20
  • I'm sorry. I've misunderstood you. English is not my native language. I am trying to improve my language skills answering for questions on Stackoverflow. I think that it is easy to write template function to calculate Fibonacci numbers. [http://ideone.com/R0qGpN](http://ideone.com/R0qGpN) I only wanted to show that there are more effective ways to calculate Fibonacci numbers. – alexeibs Mar 26 '14 at 09:32