If the compiler sill performed TCO, then all of the external foo(arr)
calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.
The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.
Consider this program:
#include <stdio.h>
int *valptr[7], **curptr = valptr, **endptr = valptr + 7;
void reset(void)
{
curptr = valptr;
}
int record(int *ptr)
{
if (curptr >= endptr)
return 1;
*curptr++ = ptr;
return 0;
}
int tally(void)
{
int **pp;
int count = 0;
for (pp = valptr; pp < curptr; pp++)
count += **pp;
return count;
}
int tail_function(int x)
{
return record(&x) ? tally() : tail_function(x + 1);
}
int main(void)
{
printf("tail_function(0) = %d\n", tail_function(0));
return 0;
}
As the tail_function
recurses, which it does via a tail call, the record
function records the addresses of different instances of the local variable x
. When it runs out of room, it returns 1
, and that triggers tail_function
to call tally
and return. tally
sweeps through the recorded memory locations and adds their values.
If tally
were subject to TCO, then there would just be one instance of x
. Effectively, it would be this:
int tail_function(int x)
{
tail:
if (record(&x))
return tally();
x = x + 1;
goto tail;
}
And so now, record
is recording the same location over and over again, causing tally
to calculate an incorrect value instead of the expected 21
.
The logic of record
and tally
depends on x
being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function
from recursing in constant space; it must allocate separate x
instances.