UPDATE: The problem is at least cursorily solved with a combination of the -XX:+UseCountedLoopSafepoints
and a JVM version equal to or greater than JDK 1.8.0_92 (Tested with _161).
ORIG: JDK 1.8.0 example run included, but it also happens on JDK 9.0.4.
I've been working on a weird bug that seems to manifest itself as the JVM stopping all threads, in a manner very consistent with safepoint pauses. The VM operation is sometimes listed as "no VM operation". The GC is not being engaged at the slowdown, as confirmed by the -XX
options output AND by jvisualVM
.
This is all in a single 100 line class below. I've paired down the code dramatically to show the problem based on a simple boolean
setting. Look for the MAKE_THE_JVM_PAUSE_UNEXPECTEDLY
boolean
.
I'm creating a 5000x5000 array of boolean
s, and accessing the first 10 rows randomly at column 66. The real code caused the same issue.
The algorithm of this example itself is nonsensical. It's just a highly paired down example.
Options used for running this are in the header comments.
package conway;
/**
* Run this with the following java VM options:
*
* -verbosegc
* -XX:+PrintGCApplicationStoppedTime
* -XX:+PrintSafepointStatistics
* -XX:PrintSafepointStatisticsCount=1
* -XX:+PrintGCDetails
* -XX:+PerfDisableSharedMem
*/
public class Conway
{
public static void main(String[] args)
{
//-----------------------[HEARTBEAT]-----------------------
new Thread(() ->
{
long startTime = System.currentTimeMillis();
long lastTime = startTime;
for (int count=0;; count++)
{
// Time stamp....
long now = System.currentTimeMillis();
System.out.printf("%d-->Delta time from Start[%dms], from last[%dms]\n",
count, now-startTime, now-lastTime);
System.out.flush();
lastTime = now;
// Waste time without sleeping...
for (long start = System.currentTimeMillis();
System.currentTimeMillis() - start < 200;);
}
}).start();
//-----------------------[COMPUTATION]-----------------------
one = new boolean[5000][5000];
boolean MAKE_THE_JVM_PAUSE_UNEXPECTEDLY = true;
for (;;)
{
if (MAKE_THE_JVM_PAUSE_UNEXPECTEDLY)
{
/**
* CALLING sweep() will cause all threads to periodically pause...
*/
sweep();
}
else
{
/**
* THIS ONE RUNS FINE. Exact same contents as in sweep() below...
*/
for (int y=0; y<5000; y++)
{
for (int x=0; x<5000; x++)
{
int living=0;
for (int yy=y-1; yy <= y+1; yy++)
for (int xx=x-1; xx <= x+1; xx++)
{
if (one[randomInt(0, 10)][66])
living++;
}
} // for(x...)
} // for(y...)
}
} // for(;;)
}
public static void sweep()
{
for (int y=0; y<5000; y++)
{
for (int x=0; x<5000; x++)
{
int living=0;
for (int yy=y-1; yy <= y+1; yy++)
for (int xx=x-1; xx <= x+1; xx++)
{
if (one[randomInt(0,10)][66])
living++;
}
} // for(x...)
} // for(y...)
}
public static int randomInt(int first, int last)
{
return (int)(((last-first+1)*Math.random())+first);
}
public static boolean[][] one;
}
Now when I run this thing with the MAKE_THE_JVM_PAUSE_UNEXPECTEDLY
boolean
set to true
, it causes the weird slowdowns.
Note that the problem first exhibits itself between heartbeat 49 and 50. The Heartbeats are from the first (anonymously launched) thread in the code and the algorithm itself is using the main class thread and are identified by the lines that look similar to this:
11-->Delta time from Start[2210ms], from last[200ms]
I terminated the run myself.
[GC (Allocation Failure) [PSYoungGen: 512K->384K(1024K)] 512K->384K(523776K), 0.0004807 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.053: ParallelGCFailedAllocation [ 5 0 0 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0005755 seconds
[GC (Allocation Failure) [PSYoungGen: 896K->480K(1024K)] 896K->488K(523776K), 0.0003818 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.063: ParallelGCFailedAllocation [ 5 0 0 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0004564 seconds
[GC (Allocation Failure) [PSYoungGen: 992K->496K(1024K)] 1000K->536K(523776K), 0.0004566 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.072: ParallelGCFailedAllocation [ 10 1 1 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0007704 seconds
[GC (Allocation Failure) [PSYoungGen: 1008K->480K(1536K)] 1048K->660K(524288K), 0.0009025 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.082: ParallelGCFailedAllocation [ 10 0 0 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0009905 seconds
[GC (Allocation Failure) [PSYoungGen: 1499K->499K(1536K)] 1679K->1006K(524288K), 0.0006638 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.102: ParallelGCFailedAllocation [ 11 1 1 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0009292 seconds
[GC (Allocation Failure) [PSYoungGen: 1518K->992K(2560K)] 2025K->2018K(525312K), 0.0007172 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.103: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 0 0 0 ] 0
Total time for which application threads were stopped: 0.0007980 seconds
[GC (Allocation Failure) [PSYoungGen: 2526K->1507K(3072K)] 3552K->3515K(525824K), 0.0009654 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.107: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 0 0 1 ] 0
Total time for which application threads were stopped: 0.0010672 seconds
[GC (Allocation Failure) [PSYoungGen: 3041K->2547K(5120K)] 5049K->5128K(527872K), 0.0010321 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.108: ParallelGCFailedAllocation [ 11 0 2 ] [ 0 0 0 0 1 ] 0
Total time for which application threads were stopped: 0.0011153 seconds
[GC (Allocation Failure) [PSYoungGen: 5102K->3072K(5632K)] 7684K->7718K(528384K), 0.0012587 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.109: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 0 0 1 ] 0
Total time for which application threads were stopped: 0.0013348 seconds
[GC (Allocation Failure) [PSYoungGen: 5630K->4608K(7680K)] 10276K->10382K(530432K), 0.0016113 secs] [Times: user=0.03 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.111: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 0 0 1 ] 0
Total time for which application threads were stopped: 0.0016921 seconds
0-->Delta time from Start[0ms], from last[0ms]
[GC (Allocation Failure) [PSYoungGen: 7678K->3328K(8704K)] 13452K->13341K(531456K), 0.0013899 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.113: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 1 0 1 ] 0
Total time for which application threads were stopped: 0.0024590 seconds
[GC (Allocation Failure) [PSYoungGen: 6399K->3360K(11264K)] 16412K->16419K(534016K), 0.0011903 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.115: ParallelGCFailedAllocation [ 11 1 2 ] [ 0 0 0 0 1 ] 0
Total time for which application threads were stopped: 0.0013046 seconds
[GC (Allocation Failure) [PSYoungGen: 8477K->5536K(11264K)] 21537K->21642K(534016K), 0.0020710 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
0.117: ParallelGCFailedAllocation [ 11 0 1 ] [ 0 0 0 0 2 ] 0
Total time for which application threads were stopped: 0.0021520 seconds
1-->Delta time from Start[210ms], from last[210ms]
2-->Delta time from Start[410ms], from last[200ms]
3-->Delta time from Start[610ms], from last[200ms]
4-->Delta time from Start[810ms], from last[200ms]
5-->Delta time from Start[1010ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
1.119: no vm operation [ 11 2 2 ] [ 0 0 0 0 0 ] 2
Total time for which application threads were stopped: 0.0001547 seconds
6-->Delta time from Start[1210ms], from last[200ms]
7-->Delta time from Start[1410ms], from last[200ms]
8-->Delta time from Start[1610ms], from last[200ms]
9-->Delta time from Start[1810ms], from last[200ms]
10-->Delta time from Start[2010ms], from last[200ms]
11-->Delta time from Start[2210ms], from last[200ms]
12-->Delta time from Start[2410ms], from last[200ms]
13-->Delta time from Start[2610ms], from last[200ms]
14-->Delta time from Start[2810ms], from last[200ms]
15-->Delta time from Start[3010ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
3.120: no vm operation [ 11 2 2 ] [ 0 0 0 0 0 ] 2
Total time for which application threads were stopped: 0.0000670 seconds
16-->Delta time from Start[3210ms], from last[200ms]
17-->Delta time from Start[3410ms], from last[200ms]
18-->Delta time from Start[3610ms], from last[200ms]
19-->Delta time from Start[3810ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
4.084: EnableBiasedLocking [ 11 2 2 ] [ 0 0 0 0 0 ] 2
Total time for which application threads were stopped: 0.0000737 seconds
20-->Delta time from Start[4010ms], from last[200ms]
21-->Delta time from Start[4210ms], from last[200ms]
22-->Delta time from Start[4410ms], from last[200ms]
23-->Delta time from Start[4610ms], from last[200ms]
24-->Delta time from Start[4810ms], from last[200ms]
25-->Delta time from Start[5010ms], from last[200ms]
26-->Delta time from Start[5210ms], from last[200ms]
27-->Delta time from Start[5410ms], from last[200ms]
28-->Delta time from Start[5610ms], from last[200ms]
29-->Delta time from Start[5810ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
6.084: no vm operation [ 11 2 2 ] [ 0 0 0 0 0 ] 2
Total time for which application threads were stopped: 0.0000621 seconds
30-->Delta time from Start[6010ms], from last[200ms]
31-->Delta time from Start[6210ms], from last[200ms]
32-->Delta time from Start[6410ms], from last[200ms]
33-->Delta time from Start[6610ms], from last[200ms]
34-->Delta time from Start[6810ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
7.085: no vm operation [ 11 2 2 ] [ 0 0 0 0 0 ] 2
Total time for which application threads were stopped: 0.0000605 seconds
35-->Delta time from Start[7010ms], from last[200ms]
36-->Delta time from Start[7210ms], from last[200ms]
37-->Delta time from Start[7410ms], from last[200ms]
38-->Delta time from Start[7610ms], from last[200ms]
39-->Delta time from Start[7810ms], from last[200ms]
40-->Delta time from Start[8010ms], from last[200ms]
41-->Delta time from Start[8210ms], from last[200ms]
42-->Delta time from Start[8410ms], from last[200ms]
43-->Delta time from Start[8610ms], from last[200ms]
44-->Delta time from Start[8810ms], from last[200ms]
45-->Delta time from Start[9010ms], from last[200ms]
46-->Delta time from Start[9210ms], from last[200ms]
47-->Delta time from Start[9410ms], from last[200ms]
48-->Delta time from Start[9610ms], from last[200ms]
49-->Delta time from Start[9810ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
10.087: no vm operation [ 11 2 2 ] [ 3009 0 3009 0 0 ] 2
Total time for which application threads were stopped: 3.0091028 seconds
50-->Delta time from Start[12994ms], from last[3184ms]
51-->Delta time from Start[13194ms], from last[200ms]
52-->Delta time from Start[13394ms], from last[200ms]
53-->Delta time from Start[13594ms], from last[200ms]
54-->Delta time from Start[13794ms], from last[200ms]
55-->Delta time from Start[13994ms], from last[200ms]
56-->Delta time from Start[14194ms], from last[200ms]
57-->Delta time from Start[14394ms], from last[200ms]
58-->Delta time from Start[14594ms], from last[200ms]
59-->Delta time from Start[14794ms], from last[200ms]
60-->Delta time from Start[14994ms], from last[200ms]
61-->Delta time from Start[15194ms], from last[200ms]
62-->Delta time from Start[15394ms], from last[200ms]
63-->Delta time from Start[15594ms], from last[200ms]
64-->Delta time from Start[15794ms], from last[200ms]
65-->Delta time from Start[15994ms], from last[200ms]
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
16.098: no vm operation [ 11 2 2 ] [ 1221 0 1221 0 0 ] 2
Total time for which application threads were stopped: 1.2213130 seconds
66-->Delta time from Start[17217ms], from last[1223ms]
67-->Delta time from Start[17417ms], from last[200ms]
68-->Delta time from Start[17617ms], from last[200ms]
69-->Delta time from Start[17817ms], from last[200ms]
70-->Delta time from Start[18017ms], from last[200ms]
71-->Delta time from Start[18217ms], from last[200ms]
72-->Delta time from Start[18417ms], from last[200ms]
73-->Delta time from Start[18617ms], from last[200ms]
74-->Delta time from Start[18817ms], from last[200ms]
75-->Delta time from Start[19017ms], from last[200ms]
76-->Delta time from Start[19217ms], from last[200ms]
77-->Delta time from Start[19417ms], from last[200ms]
78-->Delta time from Start[19617ms], from last[200ms]
79-->Delta time from Start[19817ms], from last[200ms]
80-->Delta time from Start[20017ms], from last[200ms]
Process finished with exit code 1
Now when I run this thing with the MAKE_THE_JVM_PAUSE_UNEXPECTEDLY
boolean
set to false
, it causes no slowdowns at any of the safepoints.
Why does placing this code within a method cause this?