4

Looks like having a local array in your function prevents tail-call optimization on it on all compilers I've checked it on:

int foo(int*);

int tco_test() {
    // int arr[5]={1, 2, 3, 4, 5}; // <-- variant 1
    // int* arr = new int[5]; // <-- variant 2
    int x = foo(arr);
    return x > 0 ? tco_test() : x;
}

When variant 1 is active, there is a true call to tco_test() in the end (gcc tries to do some unrolling before, but it still calls the function in the end). Variant 2 does TCO as expected.

Is there something in local arrays which make it impossible to optimize tail calls?

SergeyA
  • 61,605
  • 5
  • 78
  • 137

1 Answers1

4

If the compiler sill performed TCO, then all of the external foo(arr) calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.

The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.

Consider this program:

#include <stdio.h>

int *valptr[7], **curptr = valptr, **endptr = valptr + 7;

void reset(void)
{
  curptr = valptr;

}

int record(int *ptr)
{
  if (curptr >= endptr)
    return 1;
  *curptr++ = ptr;
  return 0;
}

int tally(void)
{
  int **pp;
  int count = 0;

  for (pp = valptr; pp < curptr; pp++)
    count += **pp;

  return count;
}

int tail_function(int x)
{
  return record(&x) ? tally() : tail_function(x + 1);
}

int main(void)
{
  printf("tail_function(0) = %d\n", tail_function(0));
  return 0;
}

As the tail_function recurses, which it does via a tail call, the record function records the addresses of different instances of the local variable x. When it runs out of room, it returns 1, and that triggers tail_function to call tally and return. tally sweeps through the recorded memory locations and adds their values.

If tally were subject to TCO, then there would just be one instance of x. Effectively, it would be this:

int tail_function(int x)
{
tail:
  if (record(&x))
    return tally();
  x = x + 1;
  goto tail;
}

And so now, record is recording the same location over and over again, causing tally to calculate an incorrect value instead of the expected 21.

The logic of record and tally depends on x being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function from recursing in constant space; it must allocate separate x instances.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    Not sure if I follow. What do you mean by 'external foo' in this case? May be some illustrating code? – SergeyA Aug 10 '16 at 16:55
  • Calls to `foo` can't be inlined because there is no definition in scope. – Kaz Aug 10 '16 at 17:48
  • @KonradRudolph I have added an illustrative example how how exporting the address of a local variable can create program semantics which breaks under constant space recursion. – Kaz Aug 10 '16 at 18:12
  • I am not following this example. How is an array is different from a single pointer in terms of external visibility? My variant2 could as well be in a form of `int* arr = &static_int_variable` and it would still be TCO-optimized (ill-formed as it is, of course). May be you could reword your example, make it slightly? shorter and emphasize how array is different from the pointer here – SergeyA Aug 10 '16 at 18:31
  • @Kaz Ah, I was confusing `foo` and `tco_test`. The answer now makes sense. But just to be pedantic: [`foo` can be inlined at link time](http://stackoverflow.com/q/5987020/1968). – Konrad Rudolph Aug 10 '16 at 18:34
  • @KonradRudolph However, the example holds up even if `record` and `tally` are completely inlined into `tail_funtion`. Of course, on the other hand, the compiler could see very deeply and just reduce the whole thing to `return 21`. – Kaz Aug 10 '16 at 18:36
  • @Kaz I know. As I said: I simply mistook `foo` for the recursive function in the explanation. – Konrad Rudolph Aug 10 '16 at 18:37