126

The following code snippet executes two threads, one is a simple timer logging every second, the second is an infinite loop that executes a remainder operation:

public class TestBlockingThread {
    private static final Logger LOGGER = LoggerFactory.getLogger(TestBlockingThread.class);

    public static final void main(String[] args) throws InterruptedException {
        Runnable task = () -> {
            int i = 0;
            while (true) {
                i++;
                if (i != 0) {
                    boolean b = 1 % i == 0;
                }
            }
        };

        new Thread(new LogTimer()).start();
        Thread.sleep(2000);
        new Thread(task).start();
    }

    public static class LogTimer implements Runnable {
        @Override
        public void run() {
            while (true) {
                long start = System.currentTimeMillis();
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    // do nothing
                }
                LOGGER.info("timeElapsed={}", System.currentTimeMillis() - start);
            }
        }
    }
}

This gives the following result:

[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1003
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=13331
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1006
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1003
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004
[Thread-0] INFO  c.m.c.concurrent.TestBlockingThread - timeElapsed=1004

I don't understand why the infinite task blocks all other threads for 13.3 seconds. I tried to change thread priorities and other settings, nothing worked.

If you have any suggestions to fix this (including tweaking OS context switching settings) please let me know.

Raedwald
  • 46,613
  • 43
  • 151
  • 237
kms333
  • 3,147
  • 7
  • 31
  • 39
  • This is a simplified version of a problem I am having in a complex app, this task blocks every single app in a tomcat container for 13-14 seconds. – kms333 Sep 02 '16 at 18:11
  • what are the OS and java version? – fukanchik Sep 02 '16 at 18:13
  • how many cores do you have? – vsminkov Sep 02 '16 at 18:15
  • OS: I have tried Linux and Mac. – kms333 Sep 02 '16 at 18:16
  • Cores: sysctl -n hw.ncpu=8 – kms333 Sep 02 '16 at 18:17
  • I think it's bytcode compiler – fukanchik Sep 02 '16 at 18:18
  • I have run this on my local machine (Docker and maven), and in AWS, I always get 13-14 seconds where the whole JVM world stops. – kms333 Sep 02 '16 at 18:19
  • Could it be the GC that kicks in? Sure sounds like it if it "stops" the jvm – Marthin Sep 02 '16 at 18:19
  • 1
    I have connected with VisualVM, I don't see GC kicking in, I don't actually see any object creation at all. – kms333 Sep 02 '16 at 18:21
  • Could also be that you hit the roof for the CodeCache perhaps? https://docs.oracle.com/javase/8/embedded/develop-apps-platforms/codecache.htm – Marthin Sep 02 '16 at 18:21
  • using java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) – kms333 Sep 02 '16 at 18:22
  • 8
    @Marthin Not GC. It's JIT. Running with `-XX:+PrintCompilation` I get the following at the time the extended delay ends: *TestBlockingThread::lambda$0 @ 2 (24 bytes) COMPILE SKIPPED: trivial infinite loop (retry at different tier)* – Andreas Sep 02 '16 at 18:24
  • Can anyone reproduce the problem with the code provided ? I can repro all the time, with two different java8 versions, with or without Docker, Mac and Linux. – kms333 Sep 02 '16 at 18:24
  • 4
    It reproduces on my system with the only change being I replaced the log call with System.out.println. Seems like a scheduler issue because if you introduce a 1ms sleep inside of the Runnable's while(true) loop the pause in the other thread goes away. – JJF Sep 02 '16 at 18:27
  • You may wish to try submitting your tasks to an ExecutorService. – Jacob Sep 02 '16 at 18:28
  • @kms333 I can reproduce, using a slightly different logger, but then I get a timeElapsed of 16 seconds – Jeutnarg Sep 02 '16 at 18:28
  • @Andreas well spotted! If someone knows about JIT setting tweaks that I can make to get around this issue, please let me know. Reading about it meanwhile ... – kms333 Sep 02 '16 at 18:29
  • Reproduced on Win 7 Enterprise/Java 1.8.0_101 and 64 bit Linux/java 1.8.0_101 – fukanchik Sep 02 '16 at 18:29
  • @Jacob I have tried that, I have the same issue I'm afraid – kms333 Sep 02 '16 at 18:30
  • Interesting data point: at least on my Mac, if you ctrl-C to send a TERM signal to the process, it still doesn't exit until after the delay (which is only about 11 seconds for me, fwiw). – yshavit Sep 02 '16 at 18:31
  • 3
    Not that I recommend it, but if you *disable* JIT using `-Djava.compiler=NONE`, it won't happen. – Andreas Sep 02 '16 at 18:32
  • 3
    You can supposedly disable JIT for a single method. See [Disable Java JIT for a specific method/class?](http://stackoverflow.com/q/4004340/5221149) – Andreas Sep 02 '16 at 18:34
  • 2
    If you use `boolean b = 1.0 / i == 0;` I get a pause of 31.3 seconds. Note: remove the `if (i != 0)` check and I get no pause of note. – Peter Lawrey Sep 02 '16 at 18:38
  • @Andreas One thing I did notice with PrintCompilation is that immediately after the delay, I saw "COMPILE SKIPPED: trivial infinite loop (retry at different tier)". I wonder if the jit basically had a hard time figuring out that it's an infinite loop for some reason, and was trying to stop the world while it did its thing? – yshavit Sep 02 '16 at 18:39
  • Okay, another data point: If you replace the lambda with just an old-style anonymous class, and compile/run with Java 7, then it works as expected without delay. That same anonymous class still produces the delay with a Java 8 JVM. – yshavit Sep 02 '16 at 18:42
  • 3
    There is no integer division in this code. Please fix your title and question. – user207421 Sep 02 '16 at 23:33
  • Can you elaborate on `boolean b = 1 % i == 0;` a bit? As @EJP points out, it's not integer division, but the modulo operator, which gets you the *remainder*. E.g., 11 % 4 == 3, since 11 == 2 * 4 + 3. That makes your expression curious, since 1 will never be evenly divisible by i unless i is 1. You could simplify the whole expression to `boolean b = (i == 1);`. – Joshua Taylor Sep 03 '16 at 22:51

4 Answers4

95

After all the explanations here (thanks to Peter Lawrey) we found that the main source of this pause is that safepoint inside the loop is reached rather rarely so it takes a long time to stop all threads for JIT-compiled code replacement.

But I decided to go deeper and find why safepoint is reached rarely. I found it a bit confusing why the back jump of while loop is not "safe" in this case.

So I summon -XX:+PrintAssembly in all its glory to help

-XX:+UnlockDiagnosticVMOptions \
-XX:+TraceClassLoading \
-XX:+DebugNonSafepoints \
-XX:+PrintCompilation \
-XX:+PrintGCDetails \
-XX:+PrintStubCode \
-XX:+PrintAssembly \
-XX:PrintAssemblyOptions=-Mintel

After some investigation I found that after third recompilation of lambda C2 compiler threw away safepoint polls inside loop completely.

UPDATE

During the profiling stage variable i was never seen equal to 0. That's why C2 speculatively optimized this branch away, so that the loop was transformed to something like

for (int i = OSR_value; i != 0; i++) {
    if (1 % i == 0) {
        uncommon_trap();
    }
}
uncommon_trap();

Note that originally infinite loop was reshaped to a regular finite loop with a counter! Due to JIT optimization to eliminate safepoint polls in finite counted loops, there was no safepoint poll in this loop either.

After some time, i wrapped back to 0, and the uncommon trap was taken. The method was deoptimized and continued execution in the interpreter. During recompilation with a new knowledge C2 recognized the infinite loop and gave up compilation. The rest of the method proceeded in the interpreter with proper safepoints.

There is a great must-read blog post "Safepoints: Meaning, Side Effects and Overheads" by Nitsan Wakart covering safepoints and this particular issue.

Safepoint elimination in very long counted loops is known to be a problem. The bug JDK-5014723 (thanks to Vladimir Ivanov) addresses this problem.

The workaround is available until the bug is finally fixed.

  1. You can try using -XX:+UseCountedLoopSafepoints (it will cause overall performance penalty and may lead to JVM crash JDK-8161147). After using it C2 compiler continue keeping safepoints at the back jumps and original pause disappears completely.
  2. You can explicitly disable compilation of problematic method by using
    -XX:CompileCommand='exclude,binary/class/Name,methodName'

  3. Or you can rewrite your code by adding safepoint manually. For example Thread.yield() call at the end of cycle or even changing int i to long i (thanks, Nitsan Wakart) will also fix pause.

Community
  • 1
  • 1
vsminkov
  • 10,912
  • 2
  • 38
  • 50
  • 7
    This is the true answer to the question of *how to fix*. – Andreas Sep 02 '16 at 21:03
  • WARNING: Do not use `-XX:+UseCountedLoopSafepoints` in production, since it may [crash JVM](https://bugs.openjdk.java.net/browse/JDK-8161147). The best workaround so far is to manually split the long loop into shorter ones. – apangin Sep 05 '16 at 22:22
  • @apangin aah. got it! thank you :) so that's why `c2` removes safepoints! but one more thing that I didn't get is what is going next. as far as I can see there is no safepoints left after loop unrolling (?) and it looks like there is no way to do stw. so there is a some kind of timeout happens and de-optimization takes place? – vsminkov Sep 05 '16 at 22:51
  • 2
    My previous comment was not accurate. Now it's completely clear what happens. At the profiling stage `i` is never 0, so the loop is speculatively transformed to something like `for (int i = osr_value; i != 0; i++) { if (1 % i == 0) uncommon_trap(); } uncommon_trap();` I.e. a regular finite counted loop. Once `i` wraps back to 0, the uncommon trap is taken, the method is deoptimized and proceeded in interpreter. During recompilation with the new knowledge JIT recognizes the infinite loop and gives up compilation. The rest of the method is executed in interpreter with proper safepoints. – apangin Sep 06 '16 at 11:25
  • @apangin yeah. thank you once again! this completely matches with my observations. It wold me nice if you can post your answer or edit this one. Without your explanation answer looks incomplete :) – vsminkov Sep 06 '16 at 11:47
  • @apangin or I can add to this answer if you don't mind – vsminkov Sep 06 '16 at 11:51
  • 1
    You could just make i a long instead of an int, that would make the loop "uncounted" and solve the issue. – Nitsan Wakart Sep 07 '16 at 21:41
  • @NitsanWakart thank you for your great blog :) I've added your comment to answer – vsminkov Sep 08 '16 at 17:37
64

In short, the loop you have has no safe point inside it except when i == 0 is reached. When this method is compiled and triggers the code to be replaced it needs to bring all the threads to a safe point, but this takes a very long time, locking up not just the thread running the code but all threads in the JVM.

I added the following command line options.

-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -XX:+PrintCompilation

I also modified the code to use floating point which appears to take longer.

boolean b = 1.0 / i == 0;

And what I see in the output is

timeElapsed=100
Application time: 0.9560686 seconds
  41423  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
Total time for which application threads were stopped: 40.3971116 seconds, Stopping threads took: 40.3967755 seconds
Application time: 0.0000219 seconds
Total time for which application threads were stopped: 0.0005840 seconds, Stopping threads took: 0.0000383 seconds
  41424  281 %     3       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
timeElapsed=40473
  41425  282 %     4       TestBlockingThread::lambda$main$0 @ 2 (27 bytes)
  41426  281 %     3       TestBlockingThread::lambda$main$0 @ -2 (27 bytes)   made not entrant
timeElapsed=100

Note: for code to be replaced, threads have to be stopped at a safe point. However it appears here that such a safe point is reached very rarely (possibly only when i == 0 Changing the task to

Runnable task = () -> {
    for (int i = 1; i != 0 ; i++) {
        boolean b = 1.0 / i == 0;
    }
};

I see a similar delay.

timeElapsed=100
Application time: 0.9587419 seconds
  39044  280 %     4       TestBlockingThread::lambda$main$0 @ -2 (28 bytes)   made not entrant
Total time for which application threads were stopped: 38.0227039 seconds, Stopping threads took: 38.0225761 seconds
Application time: 0.0000087 seconds
Total time for which application threads were stopped: 0.0003102 seconds, Stopping threads took: 0.0000105 seconds
timeElapsed=38100
timeElapsed=100

Adding code to the loop carefully you get a longer delay.

for (int i = 1; i != 0 ; i++) {
    boolean b = 1.0 / i / i == 0;
}

gets

 Total time for which application threads were stopped: 59.6034546 seconds, Stopping threads took: 59.6030773 seconds

However, change the code to use a native method which always has a safe point (if it is not an intrinsic)

for (int i = 1; i != 0 ; i++) {
    boolean b = Math.cos(1.0 / i) == 0;
}

prints

Total time for which application threads were stopped: 0.0001444 seconds, Stopping threads took: 0.0000615 seconds

Note: adding if (Thread.currentThread().isInterrupted()) { ... } to a loop adds a safe point.

Note: This happened on a 16 core machine so there is no lack of CPU resources.

vsminkov
  • 10,912
  • 2
  • 38
  • 50
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • "possibly only when i == 0" -- Yup, I'm almost positive that's it. The delay is about 11-12 seconds for me. If you remove the "if" (but not the mod operation inside it), then I get a delay of about 12 seconds, followed by an `/ by zero` exception. So putting it all together, I think the problem is that you have code that does not have a safe point, but which the JVM thinks _will_ have one (and so the JVM doesn't insert one for you). – yshavit Sep 02 '16 at 19:53
  • So the code which decides that JIT compilation is necessary runs in a separate thread. Once it decides to compile this particular lambda it needs to stop-the-world. Among other threads there is a thread executing this lambda and it needs to be stopped too. And then JIT thread and stop the whole java world are waiting for that one last thread. Is my understanding correct? – fukanchik Sep 02 '16 at 20:00
  • @fukanchik yes. and in this case this thread does not stops quite fast. I tried to explain in [my answer](http://stackoverflow.com/a/39298768/3239417) a bit deeper why it takes so long. – vsminkov Sep 02 '16 at 20:46
  • 1
    So it's a JVM bug, right? Where "bug" means severe quality of implementation issue and not violation of spec. – usr Sep 02 '16 at 21:35
  • @usr I'm not sure that is a bug. maybe it's just a drawback of speculative optimization. I'll try to build jdk9 against fix commit mentioned in JDK-8144935 and will back with updates. – vsminkov Sep 02 '16 at 21:43
  • 1
    @vsminkov being able to stop the world for several minutes due to lack of safepoints sounds like it should be treated as bug. The runtime is responsible to introduce safepoints to avoid long waits. – Voo Sep 02 '16 at 22:24
  • 1
    @Voo but on the other hand keeping safepoints in every back jump may cost a lot of cpu cycles and cause noticeable performance degradation of whole application. but I agree with you. in that particular case it looks legit to keep safepoint – vsminkov Sep 02 '16 at 22:31
  • @vsminkov absolutely, but I think we all agree that they optimized a bit too much there. Removing some safepoints is perfectly legitimate though. Interesting edge case here – Voo Sep 02 '16 at 22:36
  • 9
    @Voo well... I always recall [this](https://twitter.com/shipilev/status/578193813946134529) picture when it comes to performance optimizations :D – vsminkov Sep 02 '16 at 23:16
  • 1
    .NET does insert safepoints here (but .NET has slow generated code). A possible solution is to chunk the loop. Split into two loops, make the inner not check for batches of 1024 elements and the outer loop drives batches and safepoints. Cuts overhead conceptually by 1024x, less in practice. – usr Sep 03 '16 at 07:56
  • Would removing the `if (i != 0)` and replacing the `1%i` with `1%(i|1)` deadlock the process? – EOF Sep 03 '16 at 19:31
  • @EOF removing the if condition results in a safe point on every iteration and no deadlock. – Peter Lawrey Sep 03 '16 at 20:51
26

Found the answer of why. They are called safepoints, and are best known as the Stop-The-World that happens because of GC.

See this articles: Logging stop-the-world pauses in JVM

Different events can cause the JVM to pause all the application threads. Such pauses are called Stop-The-World (STW) pauses. The most common cause for an STW pause to be triggered is garbage collection (example in github) , but different JIT actions (example), biased lock revocation (example), certain JVMTI operations , and many more also require the application to be stopped.

The points at which the application threads may be safely stopped are called, surprise, safepoints. This term is also often used to refer to all the STW pauses.

It is more or less common that GC logs are enabled. However, this does not capture information on all the safepoints. To get it all, use these JVM options:

-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime

If you are wondering about the naming explicitly referring to GC, don’t be alarmed – turning on these options logs all of the safepoints, not just garbage collection pauses. If you run a following example (source in github) with the flags specified above.

Reading the HotSpot Glossary of Terms, it defines this:

safepoint

A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints.

Running with the above mentioned flags, I get this output:

Application time: 0.9668750 seconds
Total time for which application threads were stopped: 0.0000747 seconds, Stopping threads took: 0.0000291 seconds
timeElapsed=1015
Application time: 1.0148568 seconds
Total time for which application threads were stopped: 0.0000556 seconds, Stopping threads took: 0.0000168 seconds
timeElapsed=1015
timeElapsed=1014
Application time: 2.0453971 seconds
Total time for which application threads were stopped: 10.7951187 seconds, Stopping threads took: 10.7950774 seconds
timeElapsed=11732
Application time: 1.0149263 seconds
Total time for which application threads were stopped: 0.0000644 seconds, Stopping threads took: 0.0000368 seconds
timeElapsed=1015

Notice the third STW event:
Total time stopped: 10.7951187 seconds
Stopping threads took: 10.7950774 seconds

JIT itself took virtually no time, but once the JVM had decided to perform a JIT compilation, it entered STW mode, however since the code to be compiled (the infinite loop) doesn't have a call site, no safepoint was ever reached.

The STW ends when JIT eventually gives up waiting and concludes the code is in an infinite loop.

Community
  • 1
  • 1
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • *"Safepoint - A point during program execution at which all GC roots are known and all heap object contents are consistent"* - Why would this not be true in a loop that only sets/reads local value-type variables? – BlueRaja - Danny Pflughoeft Sep 02 '16 at 20:34
  • @BlueRaja-DannyPflughoeft I've tried to answer this question in [my answer](http://stackoverflow.com/a/39298768/3239417) – vsminkov Sep 02 '16 at 21:34
5

After following the comment threads and some testing on my own, I believe that the pause is caused by the JIT compiler. Why the JIT compiler is taking such a long time is beyond my ability to debug.

However, since you only asked for how to prevent this, I have a solution:

Pull your infinite loop into a method where it can be excluded from the JIT compiler

public class TestBlockingThread {
    private static final Logger LOGGER = Logger.getLogger(TestBlockingThread.class.getName());

    public static final void main(String[] args) throws InterruptedException     {
        Runnable task = () -> {
            infLoop();
        };
        new Thread(new LogTimer()).start();
        Thread.sleep(2000);
        new Thread(task).start();
    }

    private static void infLoop()
    {
        int i = 0;
        while (true) {
            i++;
            if (i != 0) {
                boolean b = 1 % i == 0;
            }
        }
    }

Run your program with this VM argument:

-XX:CompileCommand=exclude,PACKAGE.TestBlockingThread::infLoop (replace PACKAGE with your package information)

You should get a message like this to indicate when the method would have been JIT-compiled:
### Excluding compile: static blocking.TestBlockingThread::infLoop
you may notice that I put the class into a package called blocking

Jeutnarg
  • 1,138
  • 1
  • 16
  • 28
  • 1
    The compiler isn't taking that long, the problem is the code is not reaching a safe point because there is none inside the loop except when `i == 0` – Peter Lawrey Sep 02 '16 at 19:03
  • @PeterLawrey but why end of cycle in `while` loop is not a safepoint? – vsminkov Sep 02 '16 at 19:07
  • @vsminkov It appears there is a safepoint in `if (i != 0) { ... } else { safepoint(); }` but this is very rare. ie. if you exit/break the loop you get much the same timings. – Peter Lawrey Sep 02 '16 at 19:10
  • @PeterLawrey after a bit of investigation I found that it is common practice to make a safepoint at back jump of loop. I'm just curious what is the difference in this particular case. maybe I'm naive but I see no reason why back jump is not "safe" – vsminkov Sep 02 '16 at 19:15
  • @vsminkov I suspect that the JIT sees a safepoint is in the loop so doesn't add one at the end. – Peter Lawrey Sep 02 '16 at 19:16
  • @PeterLawrey yeah... but `if` check may be eliminated by branch predictor so safepoint can be placed at the end of loop – vsminkov Sep 02 '16 at 19:19
  • @PeterLawrey ah... got it. method is *not yet* optimized – vsminkov Sep 02 '16 at 19:20