18

In trying to write an optimized DSP algorithm, I was wondering about relative speed between stack allocation and heap allocation, and size limits of stack-allocated arrays. I realize there is a stack frame size limit, but I don't understand why the following runs, generating seemingly realistic benchmark results with cargo bench, but fails with a stack overflow when run with cargo test --release.

#![feature(test)]
extern crate test;

#[cfg(test)]
mod tests {
    use test::Bencher;

    #[bench]
    fn it_works(b: &mut Bencher) {
        b.iter(|| { let stack = [[[0.0; 2]; 512]; 512]; });
    }
}
Josh
  • 2,077
  • 1
  • 17
  • 21

1 Answers1

32

To get things into perspective, note that the size of your array is 8 × 2 × 512 × 512 = 4 MiB in size.

cargo test crashes but cargo bench doesn't because a "test" calls the function it_works() in a new thread, while "bench" calls it in the main thread.

The default stack size of the main thread is typically 8 MiB, so that array is going to occupy half of the available stack. That's a lot, but there's still room available, so the benchmark runs normally.

The stack size of a new thread, however, is typically much smaller. On Linux it is 2 MiB, and other platforms could be even smaller. So, your 4 MiB array easily overflows the thread's stack and causes a stack overflow / segfault.

You can increase the default stack size of new threads by setting the RUST_MIN_STACK environment variable.

$ RUST_MIN_STACK=8388608 cargo test 

cargo test runs the tests in parallel threads to improve total test time while benchmarks are run sequentially in the same thread to reduce noise.

Due to the limited stack size, it is a bad idea to allocate this array on stack. You have to either store it on the heap (box it) or as a global static mut.

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • 1
    This answer is totally believable, but can you provide any references? I'm a little surprised that benches aren't run in a separate thread, just with concurrency disabled, because threads also provide a nice panicking firewall (and I'd expect there to be code reuse anyway). – Shepmaster Mar 22 '17 at 19:22
  • 4
    @Shepmaster I ran it in LLDB and noted the test is run in "thread #1" (main thread) with the --bench flag and "thread #2" with the --test flag, when I crank up the array size to 5120. The relevant code seems to be here: https://github.com/rust-lang/rust/blob/8c4f2c64c6759a82f143e23964a46a65c67509c9/src/libtest/lib.rs#L1337-L1369. --test spawns a new thread, while --bench runs directly. Also, `catch_unwind` does not start a new thread. – kennytm Mar 22 '17 at 19:47
  • 1
    Oh right, `catch_unwind` exists now. Back in ye olde days the *only* way to "catch" a panic was via a separate thread. – Shepmaster Mar 22 '17 at 19:48
  • This is great, thanks. I knew it was large for the stack, and also knew that tests were run in parallel, although I didn't realize benchmarks weren't. – Josh Mar 22 '17 at 20:48