167

It seems to me that it would work perfectly well to do tail-recursion optimization in both C and C++, yet while debugging I never seem to see a frame stack that indicates this optimization. That is kind of good, because the stack tells me how deep the recursion is. However, the optimization would be kind of nice as well.

Do any C++ compilers do this optimization? Why? Why not?

How do I go about telling the compiler to do it?

  • For MSVC: /O2 or /Ox
  • For GCC: -O2 or -O3

How about checking if the compiler has done this in a certain case?

  • For MSVC, enable PDB output to be able to trace the code, then inspect the code
  • For GCC..?

I'd still take suggestions for how to determine if a certain function is optimized like this by the compiler (even though I find it reassuring that Konrad tells me to assume it)

It is always possible to check if the compiler does this at all by making an infinite recursion and checking if it results in an infinite loop or a stack overflow (I did this with GCC and found out that -O2 is sufficient), but I want to be able to check a certain function that I know will terminate anyway. I'd love to have an easy way of checking this :)


After some testing, I discovered that destructors ruin the possibility of making this optimization. It can sometimes be worth it to change the scoping of certain variables and temporaries to make sure they go out of scope before the return-statement starts.

If any destructor needs to be run after the tail-call, the tail-call optimization can not be done.

Magnus Hoff
  • 21,529
  • 9
  • 63
  • 82

5 Answers5

146

All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade), even for mutually recursive calls such as:

int bar(int, int);

int foo(int n, int acc) {
    return (n == 0) ? acc : bar(n - 1, acc + 2);
}

int bar(int n, int acc) {
    return (n == 0) ? acc : foo(n - 1, acc + 1);
}

Letting the compiler do the optimisation is straightforward: Just switch on optimisation for speed:

  • For MSVC, use /O2 or /Ox.
  • For GCC, Clang and ICC, use -O3

An easy way to check if the compiler did the optimisation is to perform a call that would otherwise result in a stack overflow — or looking at the assembly output.

As an interesting historical note, tail call optimisation for C was added to the GCC in the course of a diploma thesis by Mark Probst. The thesis describes some interesting caveats in the implementation. It's worth reading.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • ICC would do so, I believe. To the best of my knowledge, ICC produces the fastest code on the market. – Paul Nathan Oct 21 '08 at 04:04
  • 39
    @Paul The question is how much of the speed of ICC code is caused by algorithmic optimizations such as tail call optimizations and how much is caused by the cache and microinstruction optimizations that only Intel, with their intimate knowledge of their own processors, can do. – Imagist Sep 28 '09 at 12:14
  • 9
    `gcc` has more narrow option `-foptimize-sibling-calls` to "optimize sibling and tail recursive calls". This option (according to `gcc(1)` manual pages for versions 4.4, 4.7 and 4.8 targeting various platforms) is enabled at levels `-O2`, `-O3`, `-Os`. – FooF Jan 10 '14 at 06:53
  • 2
    Also, running in DEBUG mode without explicitly requesting optimizations will NOT do ANY optimization at all. You can enable PDB for true release mode EXE and try stepping through that, but note that debugging in Release mode has its complications - invisible / stripped variables, merged variables, variables becoming out of scope in unknown / unexpected scope, variables thet never go in scope and became true constants with stack-level addresses, and - well - Merged or missing stack frames. Usually merged stack frames means callee is inlined, and missing/backmerged frames probably tail call. – Петър Петров Jan 07 '15 at 09:02
24

As well as the obvious (compilers don't do this sort of optimization unless you ask for it), there is a complexity about tail-call optimization in C++: destructors.

Given something like:

   int fn(int j, int i)
   {
      if (i <= 0) return j;
      Funky cls(j,i);
      return fn(j, i-1);
   }

The compiler can't (in general) tail-call optimize this because it needs to call the destructor of cls after the recursive call returns.

Sometimes the compiler can see that the destructor has no externally visible side effects (so it can be done early), but often it can't.

A particularly common form of this is where Funky is actually a std::vector or similar.

hmuelner
  • 8,093
  • 1
  • 28
  • 39
22

gcc 4.3.2 completely inlines this function (crappy/trivial atoi() implementation) into main(). Optimization level is -O1. I notice if I play around with it (even changing it from static to extern, the tail recursion goes away pretty fast, so I wouldn't depend on it for program correctness.

#include <stdio.h>
static int atoi(const char *str, int n)
{
    if (str == 0 || *str == 0)
        return n;
    return atoi(str+1, n*10 + *str-'0');
}
int main(int argc, char **argv)
{
    for (int i = 1; i != argc; ++i)
        printf("%s -> %d\n", argv[i], atoi(argv[i], 0));
    return 0;
}
Tom Barta
  • 1,264
  • 7
  • 3
  • 1
    You can activate link-time optimization though and I guess that even an `extern` method might be inlined then. – Konrad Rudolph Feb 01 '10 at 09:52
  • 5
    Strange. I just tested gcc 4.2.3 (x86, Slackware 12.1) and gcc 4.6.2 (AMD64, Debian wheezy) and **with `-O1`** there is **no inlining** and **no tail-recursion optimization**. You have to **use `-O2` for that** (well, in 4.2.x, which is rather ancient now, it still won't be inlined). BTW It's also worth to add that gcc can optimize recursion even when it is not strictly a tail one (like factorial w/o accumulator). – przemoc Jan 15 '12 at 11:36
12

Most compilers don't do any kind of optimisation in a debug build.

If using VC, try a release build with PDB info turned on - this will let you trace through the optimised app and you should hopefully see what you want then. Note, however, that debugging and tracing an optimised build will jump you around all over the place, and often you cannot inspect variables directly as they only ever end up in registers or get optimised away entirely. It's an "interesting" experience...

Greg Whitfield
  • 5,649
  • 2
  • 30
  • 32
  • 2
    try gcc why -g -O3 and to get opimizations in a debug build. xlC has the same behavior. – g24l Jan 11 '15 at 20:46
  • When you say "most compilers": what collections of compilers do you consider? As pointed out there are at least two compilers that performs optimizations during debug build - and as far as I know VC does that too (except if you're enabling modify-and-continue perhaps). – skyking Oct 23 '17 at 11:38
9

As Greg mentions, compilers won't do it in debug mode. It's ok for debug builds to be slower than a prod build, but they shouldn't crash more often: and if you depend on a tail call optimization, they may do exactly that. Because of this it is often best to rewrite the tail call as an normal loop. :-(

0124816
  • 1,063
  • 6
  • 10