21

Is there a cheaper method call in Java 9+ which keeps its safepoint?

The JVM removes safepoints at runtime to improve efficiency however this can make profiling and monitoring the code more difficult. For this reason, we deliberately add trivial calls in carefully selected places to ensure there is a safepoint present.

public static void safepoint() {
    if (IS_JAVA_9_PLUS)
        Thread.holdsLock(""); // 100 ns on Java 11
    else
        Compiler.enable(); // 5 ns on Java 8
}

public static void optionalSafepoint() {
    if (SAFEPOINT_ENABLED)
        if (IS_JAVA_9_PLUS)
            Thread.holdsLock("");
        else
            Compiler.enable();
}

On Java 8 this overhead is fine, but Compiler.enable() gets optimised away in Java 9+ so we have to use a much more expensive method, or not enable this feature.

EDIT: Apart from profilers, I have used the safepoint() to get better detail from Thread.getStackTrace() so the application can profile itself e.g. when it takes too long to perform an action.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 10
    `Compiler.enable()` is an empty `static` method without any effect at all. I’m surprised to hear that it does not get optimized away in your Java 8 runtime. Besides that, I’m not sure whether I understand you correctly. Profilers depending on the safepoints distort the result, sure. But what you’re attempting now, is to insert the distortion into your code, to get more precision on the measurement of the already distorted execution. – Holger Jun 24 '20 at 09:01
  • @Holger This is less for sample-based profiling as it is to trap why-is-my-latency-too-high profiles in production using a background thread. – Peter Lawrey Jun 25 '20 at 12:27

1 Answers1

22

In HotSpot JVM, safepoints (where the JVM can safely stop Java threads) are

  • before the return from a non-inlined method;
  • at backward branches (i.e. in loops), unless the loop is counted. The loop is counted, if it is known to have finite number of iterations, and the loop variable fits integer type;
  • at thread state transitions (native -> Java, native -> VM);
  • in the blocking functions of the JVM runtime.

All the above places, except backward branches, imply at least a method call overhead. So, apparently the cheapest way to put a safepoint is to write a non-counted loop:

public class Safepoint {
    private static volatile int one = 1;

    public static void force() {
        for (int i = 0; i < one; i++) ;
    }
}

volatile guarantees that the loop will not be eliminated by the optimizer, and it will not be treated as counted.

I verified with -XX:+PrintAssembly that the safepoint poll instruction is inserted wherever I call Safepoint.force(). The call itself takes about 1 ns.

However, due to a bug in JDK 8, the existence of safepoint polls does not yet guarantee the correctness of stack traces obtained from a different thread. A native method call sets the last Java frame anchor, and thus "repairs" the stack trace. I guess this was a reason why you chose a native method. The bug was fixed in JDK 9+ though.

BTW, here is a couple of native methods that have lower overhead than Thread.holdsLock:

Thread.currentThread().isAlive()
Runtime.getRuntime().totalMemory()

As to profiling, the safepoint-based profilers are completely broken in the first place. This is actually a reason why I started async-profiler project a few years ago. Its goal is to facilitate profiling of Java applications with low overhead and with no safepoint bias.

apangin
  • 92,924
  • 10
  • 193
  • 247
  • 4
    What about making `force()` an entirely empty method and run the application with `-XX:CompileCommand=dontinline,Safepoint.force`? Wouldn’t it achieve the same effect, without the negative impact of `volatile` reads? – Holger Jun 24 '20 at 16:17
  • 4
    @Holger On x86, volatile read [of a stable field] does not have performance impact other than preventing certain JIT optimizations around it. But such optimizations [aren't compatible](https://stackoverflow.com/a/59116238/3448419) with a non-inlined call either. A method call needs to store caller saved registers, construct a new frame, check for stack overflow. Together with jump/ret instructions this will certainly have more overhead than a simple load. – apangin Jun 24 '20 at 17:04
  • 3
    But yes, it will achieve the same effect. – apangin Jun 24 '20 at 17:04
  • I use JMC for profiling and I have found it benefits from adding these safepoints all the same. I will have another look at your async-profiler. – Peter Lawrey Jun 25 '20 at 12:15
  • To clarify, I use it to improve the results of `Thread.getStackTrace()`. Is there a method which does the same using async profiling? – Peter Lawrey Jun 25 '20 at 12:25
  • 2
    @PeterLawrey JMC is a big step forward to accurate profiling, but it still fails to traverse many valid Java stacks. This is my [favorite test](https://github.com/apangin/codeone2019-java-profiling/blob/master/src/demo1/StringBuilderTest.java) which almost all profilers fail, including JMC. If you have any feedback about async-profiler, please open an issue on Github or contact me by other means. Thanks. – apangin Jun 25 '20 at 12:42
  • Interesting, I had a test to ensure this wasn't optimised away but had a lower bound of 2 ns, and this averaged 1 ns delay. – Peter Lawrey Jun 25 '20 at 12:45
  • 1
    With async-profiler, one don't typically need to get stack traces manually - there are several modes to collect profiles automatically, basing on CPU utilization, perf counters or just the wall clock. It has [Java API](https://pangin.pro/async-profiler/api/) to start and stop profiling, to add or remove monitored threads etc. Let me know about your use case, and we'll try to find how async-profiler can help. – apangin Jun 25 '20 at 12:49