3

Context

I am reading a tech document which basically tells me the difference between using int vs long in fori loop: Scenario A(int fori loop)

public class SafePoint {

    public static AtomicInteger num = new AtomicInteger(0);

    public static void main(String[] args) throws InterruptedException {
        Runnable runnable=()->{
            for (int i = 0; i < 1000000000; i++) {
                num.getAndAdd(1);
            }
            System.out.println(Thread.currentThread().getName()+"end!");
        };

        Thread t1 = new Thread(runnable);
        Thread t2 = new Thread(runnable);
        t1.start();
        t2.start();
        Thread.sleep(1000);
        System.out.println("num = " + num);
    }

}

Expected Result:

Thread-1end!
Thread-0end!
num = 2000000000

Scenario B(long fori loop)

public class SafePoint {

    public static AtomicInteger num = new AtomicInteger(0);

    public static void main(String[] args) throws InterruptedException {
        Runnable runnable=()->{
            for (long i = 0; i < 1000000000; i++) {
                num.getAndAdd(1);
            }
            System.out.println(Thread.currentThread().getName()+"end!");
        };

        Thread t1 = new Thread(runnable);
        Thread t2 = new Thread(runnable);
        t1.start();
        t2.start();
        Thread.sleep(1000);
        System.out.println("num = " + num);
    }

}

Expected Result:
num = 55406251 (or some random number less than 2000000000)
Thread-1end!
Thread-0end!

The most important concept he gives is the safepoint: We have safepoint in long fori loop due to uncounted loop but not int fori loop. Therefore in int fori loop, the sleep needs to wait for the two threads to finish and then do GC cause there is no safepoint when the threads are still in int fori loop.

Issue:

I take his idea and try to reproduce them on my local machine then it failed: Basically no matter whether I use int or long, it is always the result similar to the second one. The num get printed first.

Then after carefully thinking, it can only be due to the JVM I use: java 11 corretto.

Based on tech doc's idea, it basicaly means in Java 11 safepints exist both in counted and uncounted loop

Question:

Can anyone test on java 8 and tell me whether that is the reason?

I actually already tested: in java 8 we can observe the expected results of both A and B

Does java 11 change the way how we put Safe Points and why?

Related links:

The tech doc trying to explain on and How to do GC in fori loop: Why Thread.sleep(0) can prevent gc in rocketmq?

Stan
  • 602
  • 6
  • 23
  • It might actually be a difference between JVM vendors, e.g. Amazon's correcto might not have safepoints like this. – Thomas Sep 08 '22 at 15:25
  • @Thomas Then is it actually not wise to use this kind of GC way in RabbitMQ as it varies with vendors : https://stackoverflow.com/questions/53284031/why-thread-sleep0-can-prevent-gc-in-rocketmq – Stan Sep 08 '22 at 15:28
  • @Thomas nope. The issue isn't safepoints. It's the GC impl. – rzwitserloot Sep 08 '22 at 15:53
  • 1
    The reason is the [Loop Strip Mining](https://bugs.openjdk.org/browse/JDK-8186027) technique implemented in JDK 10+. It allows JIT compiler to break a large counted loops into two logically nested loops, where the inner loop has no safepoint poll instruction, while the outer one does. [Here](https://bugs.openjdk.org/browse/JDK-8223051) is a good description of the technique. – apangin Sep 08 '22 at 17:28

2 Answers2

4

The reasons of the delayed println are explained in this answer. In short, HotSpot JIT eliminated the safepoint poll inside the counted loop, so the JVM cannot reach a safepoint while the loop is running.

To eliminate a safepoint poll from the loop, two conditions must be met:

  1. the loop is counted, i.e. has a finite number of iterations ensured by an integer counter variable, and
  2. -XX:UseCountedLoopSafepoints option is disabled.

These conditions depend on the JVM version and the command line flags.

In JDK 8, a for-loop with int variable is considered counted, while a loop with long variable is not. UseCountedLoopSafepoints is always off, unless enabled explicitly with -XX:+UseCountedLoopSafepoints.

Since JDK 10, JIT compiler got the Loop Strip Mining feature. This optimization significantly reduces the overhead of safepoint polling in a tight loop, and because of that, -XX:+UseCountedLoopSafepoints became enabled by default1. That's why you don't observe a delay in JDK 11.


1 Unless a user selects Parallel or Serial GC. Since these garbage collectors are not focused on low pauses, the JVM prefers throughput and thus does not enable UseCountedLoopSafepoints. It can be still enabled manually with -XX:+UseCountedLoopSafepoints.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • This looks correct. Can you elaborate a bit more on the note 1: How does Parallel or Serial GC matters here? ``` Unless a user selects Parallel or Serial GC. Since these garbage collectors are not focused on low pauses, the JVM prefers throughput and thus does not enable UseCountedLoopSafepoints. It can be still enabled manually with -XX:+UseCountedLoopSafepoints.``` – Stan Sep 10 '22 at 04:55
  • 1
    @StanPeng GC selection only affects here the default value of `UseCountedLoopSafepoints` and `LoopStripMiningIter` options. [G1](https://github.com/openjdk/jdk/blob/68da02c7b536799ccca49e889c22f3e9a2691fb7/src/hotspot/share/gc/g1/g1Arguments.cpp#L230-L236), [Shenandoah](https://github.com/openjdk/jdk/blob/68da02c7b536799ccca49e889c22f3e9a2691fb7/src/hotspot/share/gc/shenandoah/shenandoahArguments.cpp#L123-L128), ZGC and Epsilon GC change the default value of `UseCountedLoopSafepoints` to `true`, whereas Parallel and Serial GC does not. – apangin Sep 10 '22 at 07:34
-1

ANSWER: It's because JDK8 defaults to the parallel GC, and JDK9+ to the G1 GC, and this explains all.

Proof

On an arm-chip mac, with Eclipse Temurin OpenJDK, both version 1.8 and 1.17, you get the following behaviour:

1.17 1.8
int ~50m 2000m*
long ~50m ~50m

*) This answer appears half a minute+ later, the explanation isn't that the 2 threads just get through the billion loops within the second timespan.

In other words, exactly as you describe. Given that this is temurin, it's not 'amazon coretto' specifically that's at fault.

However, if I then run the int variant on 1.17, but with:

/Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home/bin/java -XX:+UseParallelGC SafePoint

It prints 2000m just like temurin-8 does (and after ~half a minute of chugging away at it).

This therefore fully explains the difference:

  • If you run this code with int and using the Parallel GC, you get 2000m (eventually; it takes a while, the threads finish before the print code happens).
  • Otherwise (you use the G1 GC, or you use long), you get ~50m; the 'print the number' code runs well before the threads finish.

If you don't explicitly choose which garbage collection impl you want, up to JDK8 you default to the parallel GC, starting with JDK9, you get the G1 collector (and perhaps on recent versions zgc or whatnot. Not the parallel collector, in any case).

It's therefore clearly not about how the JVM injects safepoints, and you can inspect this by asking the JVM to print you the machine code it is generating for you (many blogposts you can search the web for explain precisely how to do this and what to look for) - you'll find JDK17s do similar things as JDK8 in this regard.

No, instead, the Parallel GC blocks on hitting a safepoint in every thread, whereas the G1 collector does not, a much simpler explanation.

There are many reasons why the default GC impl has been changed away from the parallel collector, and this is presumably one of them.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • 1
    Not really. A simple test showing that your "proof" does not prove anything is that JDK 8 with `-XX:+UseG1GC` still hangs for 2s. The actual reason has nothing to do with GC. New JDK versions indeed have improvements in handling of safepoint polls: in particular, [Loop Strip Mining](https://bugs.openjdk.org/browse/JDK-8186027) implemented in JDK 10 and [Long-indexed Counted Loops](https://bugs.openjdk.org/browse/JDK-8223051) in JDK 16. – apangin Sep 08 '22 at 17:46
  • @apangin Run with `+UseParallelGC` on JDK17, which shows you're incorrect here. – rzwitserloot Sep 08 '22 at 18:38
  • When Parallel GC is selected, JVM does not enable Loop Strip Mining by default (ergonomically), but it can be enabled explicitly. GC algorithm has no effect on JIT safepoint instructions per se. Your explanation in the last 3 paragraphs is completely wrong: 1) The way how JVM injects safepoint polls (not "safepoints") has certainly changed with [JDK-8186027](https://bugs.openjdk.org/browse/JDK-8186027) and [JDK-8223051](https://bugs.openjdk.org/browse/JDK-8223051). – apangin Sep 08 '22 at 20:24
  • 2) The phrase about blocking on hitting a safepoint in every thread makes no sense. The reason of the two second delay is opposite: JIT *does not* put a safepoint poll instruction in a large loop, and therefore Java threads *do not block* on time. – apangin Sep 08 '22 at 20:30
  • ```No, instead, the Parallel GC blocks on hitting a safepoint in every thread, whereas the G1 collector does not, a much simpler explanation. There are many reasons why the default GC impl has been changed away from the parallel collector, and this is presumably one of them``` Do you mean the issue is actually brought by parallel GC? Can you explain a bit more on how parallel GC will cause the main thread to wait after sleep? The answer might be not correct. But I still like to hear more ideas about this GC impl, thanks – Stan Sep 10 '22 at 04:52
  • Just try it. Run the example code and command line I wrote in this answer. You'll find that using the Parallel GC even in JDK9 or higher still gets you the behaviour that, if `int` counting is used, the code prints 2000m. Using Parallel GC 'makes the issue happen'. That's objective, and you can test this hypothesis. – rzwitserloot Sep 10 '22 at 21:21
  • I think @apangin just misunderstood the answer, particular their second point. Which is effectively: "Your point makes no sense, because {repetition of the very point}", not sure how to take that other than, I should have written this answer more clearly perhaps. – rzwitserloot Sep 10 '22 at 21:23