12

In Java 8 Update 45, adding these options to the java call:

-XX:+PrintGCApplicationStoppedTime
-XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1

shows me statistics like these:

vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
3679.229: no vm operation [ 72 1 2 ] [ 6016 0 6016 0 0 ]  1
2015-05-22T11:25:27.519+0200: Total time for which application threads were stopped: 6.0168551 seconds, Stopping threads took: 6.0164099 seconds

The problem here is the long time for Stopping threads. In this example, it is 6 seconds which is already a problem for our application, but I have seen even larger times, in one instance (without full logging, though) amounting to almost a minute.

The VM operation (here: no vm operation) is varying. I have also seen, e.g. RevokeBias, G1IncCollectionPause, or GCG_Operation. Also, the page_trap_count seems to be irrelevant. I have seen examples where it was 0, and others where it was 2. Consistent, though, is that the time always is reflected in the values of spin and sync.

I am looking for an in-depth explanation of those timing values spin and sync, but mostly I am interested in why this is happening and what I can do against it. I am not aware of anything 'evil' in our configuration. There are plenty of bored cores and unused memory on the machine, we are running pure Java (no JNI), and we are not aware of any excessive synchronization in our code.

Rich
  • 15,602
  • 15
  • 79
  • 126
malamut
  • 456
  • 4
  • 12
  • You could give more detail of your specific use case, so that people could know how to help with. – Eric May 22 '15 at 09:52
  • Actually, I see this not only with one of our applications, but I see it across the board. So there is not really a specific use case. If I knew what to look for, I could go hunting for similarities. I kind of suspect, however, that many people have the same problem but have not yet noticed. The logging 'Stopping thread took:' which pointed me to it is quite new. – malamut May 22 '15 at 10:27

1 Answers1

4

The problem here is that it takes your application a long time to reach a safepoint. The Stopping threads output denotes the time it takes between the JVM issues a safepoint request until all threads have reached a safepoint.

The sync value shows same thing - it is the time it takes for all threads to reach the safeponit.

The spin and block values denote the times it takes for blocked and spinning (executing code) threads to reach the safepoint.

Knowing this we can conclude that the problem for you is that one thread is busy spinning and not able to reach its safepoint in several seconds.

Exactly why this occurs is hard to say. An example, as illustrated in this question and it's answer is that the JIT compiler can compile heavy loops without safepoint checks.

You could try running your JVM with the options -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=500. That will timeout the safepoint syncs after 500 ms and print information about the thread(s) that failed to reach the safepoint.

Community
  • 1
  • 1
K Erlandsson
  • 13,408
  • 6
  • 51
  • 67
  • Thank you! I will try this and report back. – malamut May 22 '15 at 11:11
  • The additional options will indeed give me the thread name (no stack, but it seems there currently is not way to get it). Having a look at one of the applications exhibiting the problem, the thread is the same one most of the time. Looks like some hard work to find the actual problem, but I now know where to start, and the question is certainly answered! – malamut May 22 '15 at 12:57
  • @malamut Now that you know the thread, maybe regular thread dumps (`jstack `) can help. They wait for a safepoint too so it might be hard to tell the problem even with this help. I would love to hear what the problem was once you've identified it. It is an interesting question. – K Erlandsson May 22 '15 at 13:21
  • I will report back interesting findings. Sadly, it won't be easy, because it's probably more than one cause. The one which I'm hunting currently is specific to a single application. It goes away when disabling JIT, so I'll follow the track in your linked question first. – malamut May 22 '15 at 16:25
  • 1
    I identified one of our own methods which, when excluded from compilation, will solve the problem. I'm not really convinced that the method itself is the problem; excluding it perhaps rather hides the actual problem. Anyway, I can afford to exclude it, so for now I just do it and ask no further questions. :) Next on the list is to eliminate strangely long copy times on G1 pauses. This one is not yet ready for question here, though. Need to put some additional research there first. :) – malamut May 22 '15 at 19:01
  • @Zentopia Yes, threads doing regular sleeps shouldn't have an issue. – K Erlandsson Apr 26 '20 at 18:10