Catching stack overflow

Question

What's the best way to catch stack overflow in C?

More specifically:

A C program contains an interpreter for a scripting language.

Scripts are not trusted, and may contain infinite recursion bugs. The interpreter has to be able to catch these and smoothly continue. (Obviously this can partly be handled by using a software stack, but performance is greatly improved if substantial chunks of library code can be written in C; at a minimum, this entails C functions running over recursive data structures created by scripts.)

The preferred form of catching a stack overflow would involve longjmp back to the main loop. (It's perfectly okay to discard all data that was held in stack frames below the main loop.)

The fallback portable solution is to use addresses of local variables to monitor the current stack depth, and for every recursive function to contain a call to a stack checking function that uses this method. Of course, this incurs some runtime overhead in the normal case; it also means if I forget to put the stack check call in one place, the interpreter will have a latent bug.

Is there a better way of doing it? Specifically, I'm not expecting a better portable solution, but if I had a system specific solution for Linux and another one for Windows, that would be okay.

I've seen references to something called structured exception handling on Windows, though the references I've seen have been about translating this into the C++ exception handling mechanism; can it be accessed from C, and if so is it useful for this scenario?

I understand Linux lets you catch a segmentation fault signal; is it possible to reliably turn this into a longjmp back to your main loop?

Java seems to support catching stack overflow exceptions on all platforms; how does it implement this?

Presumably Java can do it because it's generated the machine code that implements the function-call mechanism. It is free to insert code that checks for overflow on every stack growth. — Oliver Charlesworth, Aug 15 '11 at 16:01
Can't your interpreter just check that the stack pointer is within an allowed range and abort if not? — Kerrek SB, Aug 15 '11 at 16:02
Yes, __try on Windows. Hints are here: http://stackoverflow.com/questions/7049502/c-and-try-catch-finally/7049836#7049836 — Hans Passant, Aug 15 '11 at 16:06
@Kerrek: Another, less precise, but more portable, method is simply to limit the call depth. — Oliver Charlesworth, Aug 15 '11 at 16:08
@Oli: But you wouldn't know how much stack space each call consumes, so what would be a sensible, universal limit? — Kerrek SB, Aug 15 '11 at 16:15
@Kerrek: Yes, that's a fair point. You could argue, though, based on the OP's wording, that he's really trying to detect infinite recursions. In which case, an acceptable approach might be to limit to some arbitrary heuristic, e.g. a max call depth of 100. — Oliver Charlesworth, Aug 15 '11 at 16:26
Huh. No one noticed that this is a question about `stack overflow` posted on `stackoverflow.com`. :-) — David R Tribble, Oct 17 '11 at 15:36
Do you have a means (such as a garbage collector) that will release the resources the stack fames pointed to? — Demi, Sep 15 '15 at 06:28

score 4 · Accepted Answer · answered Aug 15 '11 at 16:54

Off the top of my head, one way to catch excessive stack growth is to check the relative difference in addresses of stack frames:

#define MAX_ROOM    (64*1024*1024UL)    // 64 MB

static char *   first_stack = NULL;

void foo(...args...)
{
    char    stack;

    // Compare addresses of stack frames
    if (first_stack == NULL)
        first_stack = &stack;
    if (first_stack > &stack  &&  first_stack - &stack > MAX_ROOM  ||
        &stack > first_stack  &&  &stack - first_stack > MAX_ROOM)
        printf("Stack is larger than %lu\n", (unsigned long)MAX_ROOM);

    ...code that recursively calls foo()...
}

This compares the address of the first stack frame for foo() to the current stack frame address, and if the difference exceeds MAX_ROOM it writes a message.

This assumes that you're on an architecture that uses a linear always-grows-down or always-grows-up stack, of course.

You don't have to do this check in every function, but often enough that excessively large stack growth is caught before you hit the limit you've chosen.

Yeah, at the end of the day that's probably the best way to do it - the runtime overhead should only be a handful of instructions, it only has to go in to functions that are actually recursive, and it has the virtue of being mostly portable. — rwallace, Aug 15 '11 at 17:05
Minor: `(64*1024*1024UL)` --> `(64UL*1024*1024)`. Lead with the widest type, else `64*1024` is 0 with 16-bit `unsigned`. Common on embedded products where SO is a concern - although doubtful such a device has 64 MB of memory. — chux - Reinstate Monica, Dec 01 '17 at 18:45

score 2 · Answer 2 · edited May 19 '18 at 17:18

(I won't bother those methods depending on particular platforms for "better" solutions. They make troubles, by limiting the language design and usability, with little gain. For answers "just work" on Linux and Windows, see above.)

First of all, in the sense of C, you can't do it in a portable way. In fact, ISO C mandates no "stack" at all. Pedantically, it even seems when allocation of automatic objects failed, the behavior is literally undefined, as per Clause 4p2 - there is simply no guarantee what would happen when the calls nested too deep. You have to rely on some additional assumptions of implementation (of ISA or OS ABI) to do that, so you end up with C + something else, not only C. Runtime machine code generation is also not portable in C level.

(BTW, ISO C++ has a notion of stack unwinding, but only in the context of exception handling. And there is still no guarantee of portable behavior on stack overflow; though it seems to be unspecified, not undefined.)

Besides to limit the call depth, all ways have some extra runtime cost. The cost would be quite easily observable unless there are some hardware-assisted means to amortize it down (like page table walking). Sadly, this is not the case now.

The only portable way I find is to not rely on the native stack of underlying machine architecture. This in general means you have to allocate the activation record frames as part of the free store (on the heap), rather than the native stack provided by ISA. This does not only work for interpreted language implementations, but also for compiled ones, e.g. SML/NJ. Such software stack approach does not always incur worse performance because they allow providing higher level abstraction in the object language so the programs may have more opportunities to be optimized, though it is not likely in a naive interpreter.

You have several options to achieve this. One way is to write a virtual machine. You can allocate memory and build the stack in it.

Another way is to write sophisticated asynchronous style code (e.g. trampolines, or CPS transformation) in your implementation instead, relying on less native call frames as possible. It is generally difficult to get right, but it works. Additional capabilities enabled by such way are easier tail call optimization and easier first-class continuation capture.

Recently I finished [trampoline code](https://bitbucket.org/FrankHB/yslib/src/f84f8133348ac0c9664429684fa0c1ebcb4e5cc2/YFramework/source/NPL/NPLA1.cpp?at=master&fileviewer=file-view-default#NPLA1.cpp-45) for most paths of my interpreter. It is actually not that hard. The most difficult part is to transform a direct-style implementation to trampolined code _manually_, which is very error-prone. If you have already know what should it be, just start to write a fresh one, it is far simpler. (TCO with side effects is always hard, though.) — FrankHB, Mar 13 '18 at 05:43
Now I think I have reinvented something like [CESK-style machine](http://matt.might.net/articles/cesk-machines/) :) — FrankHB, Mar 14 '18 at 04:57

score 2 · Answer 3 · answered Aug 15 '11 at 16:13

2

AFAIK, all mechanisms for detecting stack overflow will incur some runtime cost. You could let the CPU detect seg-faults, but that's already too late; you've probably already scribbled all over something important.

You say that you want your interpreter to call precompiled library code as much as possible. That's fine, but to maintain the notion of a sandbox, your interpreter engine should always be responsible for e.g. stack transitions and memory allocation (from the interpreted language's point of view); your library routines should probably be implemented as callbacks. The reason being that you need to be handling this sort of thing at a single point, for reasons that you've already pointed out (latent bugs).

Things like Java deal with this by generating machine code, so it's simply a case of generating code to check this at every stack transition.

answered Aug 15 '11 at 16:13

Oliver Charlesworth

267,707
33
569
680

2

When you say already scribbled over something important, doesn't a segmentation fault occur when the MMU has _prevented_ a write outside the stack memory area? i.e. scribbling has been prevented from occurring? What am I missing? – rwallace Aug 15 '11 at 16:27
If the 'process' of the interpreted code is running inside the same process of the interpreter, it could scribble over the interpreter's memory before doing something worse (like the kind of thing that the MMU would complain about). By that point the interpreter code would not be in a good state to deal with the error. – Joe Aug 15 '11 at 16:34
3

In principle yes, but don't modern operating systems put a guard page right after the stack segment to make sure you get the segmentation fault before other stuff starts being overwritten? – rwallace Aug 15 '11 at 16:47

Catching stack overflow

3 Answers3

Linked

Related