My java application cost long time to reach safepoint during the stage of 'sync'

Question

My java application stopped almost 190ms during whole gc process include the cost of waiting threads reach to the safepoint at the time of '2022-10-26 16:08:37.285'. The gc log shows that the time of STW is just 30+ ms, so I download the vm log to check if the long stopped time is caused by the stage of waiting threads reach to safepoint and block threads. The vm log shows that it cost 183 ms during the stage of 'sync.I wander what result in so long time stopped during the stage of 'sync'.

gc log:

2022-10-26T16:08:36.239+0800: 7081.719: [GC concurrent-root-region-scan-start]
2022-10-26T16:08:36.239+0800: 7081.719: Total time for which application threads were stopped: 0.0424589 seconds, Stopping threads took: 0.0001501 seconds
2022-10-26T16:08:36.279+0800: 7081.759: [GC concurrent-root-region-scan-end, 0.0402316 secs]
2022-10-26T16:08:36.279+0800: 7081.759: [GC concurrent-mark-start]
2022-10-26T16:08:37.246+0800: 7082.726: Application time: 1.0071790 seconds
2022-10-26T16:08:37.248+0800: 7082.727: [GC pause (G1 Evacuation Pause) (young) 7082.727: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 8541, predicted base time: 7.41 ms, remaining time: 27.59 ms, target pause time: 35.00 ms]
 7082.727: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 85 regions, survivors: 17 regions, predicted young region time: 24.66 ms]
 7082.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 85 regions, survivors: 17 regions, old: 0 regions, predicted pause time: 32.08 ms, target pause time: 35.00 ms]
, 0.0355926 secs]
   [Parallel Time: 32.8 ms, GC Workers: 10]
      [GC Worker Start (ms): Min: 7082727.6, Avg: 7082727.6, Max: 7082727.6, Diff: 0.1]
      [Ext Root Scanning (ms): Min: 0.3, Avg: 0.9, Max: 2.2, Diff: 1.9, Sum: 9.0]
      [Update RS (ms): Min: 1.9, Avg: 3.1, Max: 3.4, Diff: 1.5, Sum: 30.8]
         [Processed Buffers: Min: 9, Avg: 15.4, Max: 23, Diff: 14, Sum: 154]
      [Scan RS (ms): Min: 3.2, Avg: 3.9, Max: 4.1, Diff: 0.8, Sum: 39.2]
      [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
      [Object Copy (ms): Min: 24.6, Avg: 24.7, Max: 25.2, Diff: 0.6, Sum: 247.2]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 25.9, Max: 40, Diff: 39, Sum: 259]
      [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]
      [GC Worker Total (ms): Min: 32.7, Avg: 32.7, Max: 32.7, Diff: 0.1, Sum: 327.1]
      [GC Worker End (ms): Min: 7082760.3, Avg: 7082760.3, Max: 7082760.3, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.8 ms]
   [Other: 1.9 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 0.3 ms]
      [Ref Enq: 0.0 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.2 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 0.3 ms]
   [Eden: 680.0M(680.0M)->0.0B(3192.0M) Survivors: 136.0M->80.0M Heap: 9831.7M(16.0G)->9154.2M(16.0G)]
 [Times: user=0.32 sys=0.01, real=0.04 secs] 
2022-10-26T16:08:37.283+0800: 7082.763: Total time for which application threads were stopped: 0.0368899 seconds, Stopping threads took: 0.0001211 seconds
2022-10-26T16:08:39.807+0800: 7085.287: [GC concurrent-mark-end, 3.5278637 secs]
2022-10-26T16:08:39.807+0800: 7085.287: Application time: 2.5240566 seconds
2022-10-26T16:08:39.808+0800: 7085.288: [GC remark 2022-10-26T16:08:39.808+0800: 7085.288: [Finalize Marking, 0.0004663 secs] 2022-10-26T16:08:39.809+0800: 7085.289: [GC ref-proc, 0.0004983 secs] 2022-10-26T16:08:39.809+0800: 7085.289: [Unloading, 0.0123397 secs], 0.0222890 secs]
 [Times: user=0.17 sys=0.04, real=0.03 secs] 
2022-10-26T16:08:39.831+0800: 7085.310: Total time for which application threads were stopped: 0.0233879 seconds, Stopping threads took: 0.0001474 seconds
2022-10-26T16:08:39.831+0800: 7085.310: Application time: 0.0000378 seconds
2022-10-26T16:08:39.832+0800: 7085.311: [GC cleanup 10G->10G(16G), 0.0194897 secs]
 [Times: user=0.16 sys=0.03, real=0.02 secs] 
2022-10-26T16:08:39.851+0800: 7085.331: Total time for which application threads were stopped: 0.0204211 seconds, Stopping threads took: 0.0001703 seconds
2022-10-26T16:08:41.088+0800: 7086.568: Application time: 1.2366444 seconds
2022-10-26T16:08:41.089+0800: 7086.569: Total time for which application threads were stopped: 0.0011031 seconds, Stopping threads took: 0.0002013 seconds
2022-10-26T16:08:41.089+0800: 7086.569: Application time: 0.0000264 seconds
2022-10-26T16:08:41.090+0800: 7086.569: Total time for which application threads were stopped: 0.0006572 seconds, Stopping threads took: 0.0000450 seconds
2022-10-26T16:08:42.038+0800: 7087.518: Application time: 0.9482548 seconds

vm log:

7069.947: G1IncCollectionPause             [     290          0              0    ]      [     0     0     0     0    33    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7074.822: G1IncCollectionPause             [     290          1              1    ]      [     0     0     0     0    33    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7076.947: G1IncCollectionPause             [     290          1              3    ]      [     0     0     0     0    33    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7081.676: G1IncCollectionPause             [     290          1              4    ]      [     0     0     0     0    41    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7082.543: G1IncCollectionPause             [     290          0              0    ]      [     0     0   183     0    35    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7085.287: CGC_Operation                    [     290          0              2    ]      [     0     0     0     0    22    ]  0   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7085.311: CGC_Operation                    [     290          2              2    ]      [     0     0     0     0    19    ]  2   
         vmop                    [threads: total initially_running wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
7086.567: FindDeadlocks                    [     290          2              6    ]      [     0     0     0     0     0    ]  0

I wander what may cause so long time stopped during the stage of 'sync' and how to resolve the issue

Please provide enough code so others can better understand or reproduce the problem. — Community, Oct 26 '22 at 18:03
[async-profiler](https://github.com/jvm-profiling-tools/async-profiler/) has a special mode for profiling time-to-safepoint phases: `--ttsp`. See also [this answer](https://stackoverflow.com/a/67237599/3448419) for possible reasons. — apangin, Oct 26 '22 at 22:10

Turac · Answer 1 · 2022-10-26T18:49:40.867

Your log not enough for find solution but it shows problem source. VM want reach to safepoint and all threads should stop on known points. For reach to safepoint all threads must stopped. Some threads can reach to safepoint and stop but some threads continue working operation because not observed it yet. Stopped threads are waiting to running threads.This time, represent as sync keyword. That means, sync is total time of between send safepoint request and block all java threads. So your applicationis waiting to stop all java threads. It can be many reason but The main reasons I know are:

Long loops
Too many threads on running state
Large data copy. e.g. System.arraycopy()

This pdf(page 199) explain TTSP(time to safe point) very well.(It is explaining with shenandoah GC but it is enough for understand)

183ms is enough to copy 1GB array. But more importantly, the logs say there are 290 threads in total, and none of them were running at the time of safepoint sync. So none of the mentioned reasons seem relevant here. Most often, such delays are caused by OS-level freezes due to paging, direct reclaim, etc. — apangin, Oct 26 '22 at 22:24

My java application cost long time to reach safepoint during the stage of 'sync'

1 Answers1