Java gives seemingly random StackOverflowError

Question

Here is a Factorial-function from inside one of my projects:

public class Fac {
    public static void main(String[] args) {
        System.out.println(fac(9000));
    }

    public static long fac(long x) {
        if (x <= 1) {
            return 1;
        }
        else {
            return x * fac(x-1);
        }
    }
}

When this function is run with a large parameter, e.g. fac(9000), it sometimes, but not always, causes a StackOverflowError. I've noticed that, when I launch the program calling fac(9000) from inside my IDE (IntelliJ Community Edition 2018.2.1), the error occurs about 2 out of 3 times, however when I launch it from my shell it can be called over 300 times or more whilst crashing only once or twice (I used a short bash script to test this). I'm using openjdk-11 on Ubuntu 18.04.

I've tried setting an explicit stack-size using -Xss1024k (also used some other sizes), but the problem persists.

I know that the result stored in a long won't be correct anymore (when it doesn't crash the result is 0), but I don't know what causes the seemingly random crashing and why it occurs more often when run from inside the IDE.

Try a 4M stack size. Also see this similar issue: https://stackoverflow.com/questions/9276215/stack-overflow-error-occurs-when-using-recursive-fibonacci-function — Steven Spungin, Aug 19 '18 at 19:57
Increasing the stack size to 4M or higher doesn't change anything for me. I just tested it 300 times multiple times with 4M and 8M and for both sizes the error count was arround 4 to 11. The question you linked to sounds similiar, but my question isn't why it crashes at all, but rather why it only crashes sometimes. — Simon, Aug 19 '18 at 20:05
When you launch your app from IDE stack will be already longer due to IDE code launching your app, and this code can be different depending on your project state and/or used options. — GotoFinal, Aug 19 '18 at 20:15
This was my theory as well and this would explain, why the program crashes far more often when run from the IDE than when it is run from the shell, but still it doesn't always crash either way. — Simon, Aug 19 '18 at 20:18
What exactly do you mean? My program only has one thread, all I do is call `fib(9000)` once in a main method (for testing now). — Simon, Aug 19 '18 at 20:32
Your app probably have more threads, like additional GC threads and all that other java overhead. But I think that stack size was per thread anyway — GotoFinal, Aug 19 '18 at 21:25
What's the IDE? Sometimes there are different memory settings for the environment vs. the launched executable. Also have you tried calling to force GC (https://stackoverflow.com/questions/1481178/how-to-force-garbage-collection-in-java) Not that it will help performance, but perhaps it will prevent the crashing. — Steven Spungin, Aug 19 '18 at 21:39
When it doesn’t crash, does it produce a result? Not clear how you are invoking the method - please post a [mcve] — Erwin Bolwidt, Aug 19 '18 at 22:09
By the way, this is not the Fibonacci function. It is the factorial function. — Johannes Kuhn, Aug 20 '18 at 00:41
I've updated the code in my question with a main method and thanks, of course I meant factorial not Fibonacci. — Simon, Aug 20 '18 at 08:51
I saw the same when playing with factorial function a while ago - some calls for large input numbers ended with stack overflow, other did not. I noticed that if I "warm up" the code (run it repeatedly with smaller numbers before testing with a large number), stack overflow did not occur. I concluded that it's likely a JIT compiler optimization that takes place and reduces the stack usage and in the end prevents the SOE. Because JIT activity depends on multiple factors, it does not take place always at the same time, which creates "random" behavior. — david a., Aug 20 '18 at 09:05
This was on Oracle 1.8 JVM btw. It might be interesting to enable JIT logging and try to correlate the behavior to the optimizations that JIT compiler is doing. — david a., Aug 20 '18 at 09:06
When I call the method in a simple `for (int i = 0; i < 100000; i++) { fib(9000)}` one of two cases appears: either the program crashes on its first call of `fib(9000)` or on none of the 100000 calls. — Simon, Aug 20 '18 at 09:23
When I try to "warm it up" as david said `for (int i = 0; i < 100000; i++) { fib(i) }` it always crashes on `fib(39525)` for me. — Simon, Aug 20 '18 at 09:25
I have tried both of the for-loops above with different stack-sizes with the `-Xss` flag, but the results are the same. — Simon, Aug 20 '18 at 09:45
@SimonMerkelbach - what if you warm it up by calling it 10,000 times with a smaller input - e.g. `fac(100)` or so (number small enough it just fits to the stack), then call it with `9000` afterwards? — david a., Aug 20 '18 at 09:45
@davida. I'm calling `fac(100)` 10,000 times before calling `fac(9000)` and it doesn't crash anymore when run from the shell and also doesn't crash anymore when run from the IDE. — Simon, Aug 20 '18 at 09:52
@SimonMerkelbach ok, then it might be interesting what JIT optimizations are taking place in such setup, but do not (or only do occasionally) if calling `fac(9000)` right away. — david a., Aug 20 '18 at 09:59
@davida. I've also tried it with smaller for-loops, e.g. calling `fac(100)` only 100 times before calling `fac(9000)`. This causes the StackOverflowError to happen less often the bigger the number of iterations of the for-loop is. Is there any way to check the JIT optimizations? — Simon, Aug 20 '18 at 10:04
@SimonMerkelbach, `-XX:+PrintCompilation` and/or JITWatch perhaps? Just an idea, not much experience with that. — david a., Aug 20 '18 at 10:19

score 0 · Answer 1 · answered Aug 20 '18 at 14:41

This is almost certainly a JIT problem.

It is worth noting that StackOverflowError's are triggered in a very odd way due to the fact that the "stack" in java is not very conventional. I won't go too deep into the details, but it appears to essentially be a linked list in the heap.

Given some of the symptoms people have reported in the comments, I think it is strongly related to the JIT. To quote Simon

When I call the method in a simple for (int i = 0; i < 100000; ++i) { fib(9000); } one of two cases appears: either the program crashes on its first call of fib(9000) or on none of the 100000 calls.

Generally the JIT should be activated at a consistent point, especially if there is no variance in the run. If you decide to turn on compilation printing, you will notice that a lot of classes are being compiled that don't seem to relate to your program at all. This could be problematic in the sense that it might take a while for your code to get compiled (JIT compilation is handled as a queue). Another issue is that the JIT is non-blocking. What this means is that the JIT will be triggered, then it will compile the method in another thread, then it will inject the method pointer into the program. Because it is non-blocking, it may finish after the function has triggered a StackOverflowError.

There are ways that we could improve the odds of success.

Warm up the method by calling it many times with a sizeable number that never triggers a StackOverflowError. This should give the JIT time to inject the compiled method.
Set the program to run in single core mode. This can be done (I can only confirm how to do this in Win10) by opening Task Manager, right clicking on the Java process, then clicking on Go to details. This will bring you to the Details page with the process selected. You can now right click the process detail and select Set affinity, deselect all cores but one. This should force the program to run in a single core context, which will cause all threads to be blocking, including the JIT. This does have the notable caveat that the GC will now be blocking.
Follow the same steps as before to get to the Details page, right click the process detail and select Set priority, then select High (I have found that selecting Realtime tends to stall the system). This will put that processes thread at the top of the list for other things to run, this will hopefully allow your JIT to get more continuous time on the core for compilation.
Put the program onto a minimal linux server, this will be some what like #3. The goal here is to have nothing else running on the server. This isn't really recommended, but it may work.

For #2 and #3 you may have to have something in your program to stall execution until you set the priority or affinity. Optionally you could use a program like Process Hacker to save the prioritization or affinity and have it set immediately at every launch.

Java gives seemingly random StackOverflowError

1 Answers1