How can I catch a stack overflow in a Rust child thread?

Question

I have an recursive algorithm which can be very deeply nested (it is part of a decompiler). I know that normally you wouldn't just increase the stack size, because a stack overflow normally indicates infinite recursion, but in this case the algorithm may sometimes just need a greater stack, so I am running the algorithm in a child thread with a stack size that can be increased with a CLI flag:

fn main() -> Result<(), Box<std::io::Error>> {
    // Process arguments
    let args: Cli = Cli::from_args();

    let child = thread::Builder::new()
        .name("run".into())
        .stack_size(args.stack_size * 1024 * 1024)
        .spawn(move || -> Result<(), Box<std::io::Error>> { run(args)?; Ok(()) })
        .unwrap();

    child.join().unwrap()?;

    Ok(())
}

fn run(args: Cli) -> Result<(), Box<std::io::Error>> {
    ...
}

This works great, pass --stack-size=20 to the app and the run thread will be given a stack of 20MB and as long as that's enough, then it runs happily.

Except, that is, for the first time you run it, with only the default stack of 8MB, and you get this error:

thread 'run' has overflowed its stack
fatal runtime error: stack overflow

I would like to catch this error and instead print a message alerting the user to the fact that they can pass --stack-size=X to give the decompiler a greater stack.

How can I catch the stack overflow of a Rust child thread?

I found something wrt stack overflow panics from [here](https://users.rust-lang.org/t/how-to-diagnose-a-stack-overflow-issues-cause/17320): "stack overflow is an aborting panic, it doesn’t unwind and is not catchable. You’d need the compiler to insert stack probes (sometimes referred to as stack banging) and initiate unwinding if the probe fails; this would necessarily incur a perf penalty." — Todd, Jun 18 '21 at 05:43
Reading more of the thread, "stack probes" may be supported [link](https://github.com/rust-lang/rust/pull/42816). — Todd, Jun 18 '21 at 05:46
@Todd Maybe if you need to catch a stack overflow then using a subprocess rather than a subthread might be better? However even if I ran the algorithm as a separate process, it would still be printing the "thread 'run' has overflowed its stack" message... — curiousdannii, Jun 18 '21 at 05:49
At least it doesn't take down your application, and your parent process could determine if the child exited and print out an error on its behalf.. Also, maybe stderr could be sent through the parent, which can determine whether to print it or not. — Todd, Jun 18 '21 at 06:37
Maybe consider using looping instead of recursion. That would be more memory/performance efficient, just not as easy to write. — Todd, Jun 18 '21 at 06:38
@Todd The algorithm is processing a control flow graph. It would be possible to change the data structure to make some nesting less deep, but not everything. But thanks for all these ideas, I'll keep thinking and working on it. — curiousdannii, Jun 19 '21 at 00:11
Hm.. I was thinking it was AST's you were traversing.. Not the same thing as flow graphs. But anyway, wrt trees, plenty of examples of [looping algorithms for tree traversal](https://www.geeksforgeeks.org/inorder-tree-traversal-without-recursion/) to pilfer. Interestingly, this one creates its own stack. — Todd, Jun 19 '21 at 05:49
I recently played around with converting recursive code to iterative code, and remembered this question. If you're still looking for a way to convert recursive code, maybe [this can be of help](https://stackoverflow.com/a/68872963/7915759). — Todd, Aug 21 '21 at 23:30

Todd · Answer 1 · 2021-08-21T23:27:14.637

"aborting panics":

Stack overflows are in the category of what I've seen referred to as "aborting panics". They can't be caught and recovered from using catch_unwind(). The OP suggested using a child process to isolate the failure from the rest of the application, which seems like a reasonable workaround.

Here's a nice long thread on Reddit talking about "stack probes" (among other things). This might be a way to prevent against thread overflows. Here's the documentation for Module compiler_builtins::probestack, if you want to find out more.

A couple excerpts from this reference:

The purpose of a stack probe is to provide a static guarantee that if a thread has a guard page then a stack overflow is guaranteed to hit that guard page.

Finally it’s worth noting that at the time of this writing LLVM only has support for stack probes on x86 and x86_64.

A caveat is in order. I've seen mention that the stack probe feature isn't entirely secure. This may not matter for most applications, but may for things like compilers available through website automation.

Avoiding recursion

Recursive algorithms can be easier to code, but are less efficient in many cases than looping to iterate over data structures like trees. The looping approach is more difficult to code, but can be faster and use less memory. If tree traversal is the problem being addressed, there are a lot of examples online to pull from. Some algorithms that avoid recursion use their own programatically declared stacks, for instance vectors, lists, or other stack like structure. Other algorithms like Morris Traversal don't require maintaining a stack data structure. Reworking problematic recursive logic is one way to reduce the chances for stack overflows.

Implement your own stack

For a language agnostic example of how to convert recursive functions to iterative in Python, here's a generic approach. I wrote this answer after running into stack issues with recursive multi-key quicksort code I converted to Rust. I was using it to sort suffix arrays which required tens of thousands deep recursive calls. After I converted following the approach described, the application was able to process very large blocks of text with no problem.

For non-fatal recoverable panics:

If a panic is not fatal/unrecoverable, it is possible to catch them and get diagnostic information using the std::panic crate.

More information on controlling panics can be found in the Controlling panics with std::panic section of the Rust Edition Guide. Here's a link to the documentation on std::panic.

use std::env;
use std::thread;
use std::panic::catch_unwind;
use std::panic;

fn main() -> Result<(), Box<std::io::Error>> {
    let args = env::args().collect::<Vec<String>>();

    panic::set_hook(Box::new(move |panic_info| {
        match panic_info.location() {
            Some(loc) => {
                println!("Panic in file {} line {}.", loc.file(), loc.line());
            },
            None => { println!("No panic info provided..."); },
        }
    }));

    let child = thread::spawn(move || {
        catch_unwind(|| {
            run(args);
        });
    });

    child.join();
    
    Ok(())
}

fn run(args: Vec<String>) -> Result<(), Box<std::io::Error>> {
    println!("Args from command line: {:?}", args);
    panic!("uh oh!");
    Ok(())
}

Hm, thanks for the idea, but I just tried this, and it didn't catch the stack overflow. Are stack overflows panics? They seem to act differently to me. — curiousdannii, Jun 18 '21 at 05:32
Oh.. no, I don't know specifically why stack overflows wont work with this. Hopefully someone else can provide some insight @curiousdannii — Todd, Jun 18 '21 at 05:36

How can I catch a stack overflow in a Rust child thread?

1 Answers1