Stack traces vs. TCO: Do non-tail recursive languages generally have efficient stack frames?

Question

One reason I have heard for some languages to not support tail-call optimization (TCO) is that this optimization comes at the cost of obfuscating the call stack if/when debugging should require one to look at a stack trace. (I have heard other reasons such as "The virtual machine does not support it", but let's disregard Java for now.)

It seems then that for some situations in some set of languages where TCO would be possible, but where one does not perform it, the only purpose of stack frames is to maintain meta-data for any eventual stack trace to be generated. I.e., the stack frame could be minimal, containing only enough information to generate the stack trace.

Question(s): Would it not make sense to minimize the size of such stack frames? Would it not minimize the use of stack space, thus allowing deeper levels of recursion before running out of space? Is this attempted in languages where this thought applies? (I'm thinking in particular of Python.) Or is this a lost effort with regards to actual space saved? (I imagine that the meta-data necessary for generating nice stack traces is actually quite a lot compared to what's normally in a stack frame.)

Idea in short: Minimize size of stack frames as an alternative to TCO.

PS. My thoughts are not based on any actual benchmarks. I could be way off here.

O(n) space is O(n) space. It doesn't matter how small the positive constant is, it's still O(n), so minimizing the stack frames doesn't help. Either you have tail-call optimization or you don't; there's no in-between. — user541686, Sep 01 '13 at 19:35
As far as I am aware *all* languages have 'efficient stack frames'. They contain what they have to contain: local variables and return addresses. I don't see how that can be 'minimized' further. — user207421, Sep 01 '13 at 23:30
@EJP: You didn't read my answer carefully. A compiler can do a lot to minimize the amount of space a stack frame takes, including doing good register allocation (as opposed to poor register allocation thus requiring lots of spill temporaries) and value lifetime management. — Ira Baxter, Sep 02 '13 at 03:02

score 2 · Accepted Answer · edited May 23 '17 at 10:25

If one is debugging, not all optimizations need to be turned on unless you are debugging the optimization itself. So tail recursion optimization (TRO) doesn't have to disable backtraces during debugging unless the development environment is braindead (eg., doesn't have a "disable optimizations" option).

The space occupied by the metadata if you did record it is pretty small; basically its says, "a recursive call was made". A linked list with elements of a small number of bytes would do the trick. But I don't think it would make sense to record backtrace metadata in the face of TRO; if you are optimizing, you probabaly don't want to pay the extra time cost of recording the metadata, either, just because somebody might want to debug later.

I'm not sure what you mean by "minimize the size of stack frame" since you ask in the context of TRO. In general, a good compiler will minimize the size by automatically by only assigning enough space to do the the entire computation of the (sub)routine which the stack frame instance represents. One standard optimization is letting variables with non-overlapping lifetimes share the same space in the stack frame. One can do this a couple of different ways: a) nested scopes inside the body of the subroutine which don't overlap can trivially share their space in a stack-like discipline, and b) treating the "stack frame" not so much as a stack, but as a bunch of registers for storing spilled values; a register coloring algorthim can handle this pretty nicely.

Sometimes stack frame minimization is damaged by the (lack of) support by the OS. See this for a discussion of how Windows traps damage stack frame size considerably.

+1 That's a very interesting link. I find it weird that Windows would push the CPU context on the user-mode stack, though -- shouldn't it be pushed on the kernel-mode stack? What if the stack pointer is bad -- would the system then crash? — user541686, Sep 01 '13 at 19:30
@Mehrdad: It makes sense that the full process context is stored in the user space, especially since one can write user trap routines to inspect/modify/restart after a trap. (We do this for divide-by-zero traps, etc. in our PARLANSE language). What is stupid is pushing it into the user stack; my point was that the application could tell the OS where to put trap data (including the stack for backwards compatibility) to avoid this rediculous demand on stack frame size. ... — Ira Baxter, Sep 01 '13 at 20:05
@Mehrdad: ... Regardless, if the SP is bad and a trap occurs, presumably the hardware produces a "Double Fault" and Windws simply aborts your process. If the SP is good to take the hardware-pushed trap data, but not enough space remains to hold the additional stuff MS pushes, yes, Windows might crash; presumably MS thought about (or experienced) this and has a defense. — Ira Baxter, Sep 01 '13 at 20:06
Thank you for this answer. I had thought that turning stack traces off and thus enabling TCO/TRO might render some applications relying on recursion hard to run with debugging enabled without exhausting stack memory. Thank you for pointing out that the full context is not necessarily pushed to the stack, but only the parts deemed necessary during static analysis. — sshine, Sep 02 '13 at 01:12

score 1 · Answer 2 · answered Sep 01 '13 at 19:29

You basically have a choice between "optimizing" tail recursion or maintaining an "accurate" call stack. You either maintain some sort of "bread crumbs" and don't fully-optimize tail recursion or you do optimize tail-recursion while losing your stack trace.

Every well-designed language implementation (that is at all interested in runtime performance and "footprint") "minimizes" stack frame size.

But, like virtually everything in computing, there's a trade-off between competing factors -- performance, "footprint", simplicity/reliability of the implementation, flexibility (eg, multiple languages), debugability, et al.

Stack traces vs. TCO: Do non-tail recursive languages generally have efficient stack frames?

2 Answers2