29

We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...).

Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:

  • OpenJDK 11.0.2: 36 seconds
  • OpenJDK 13.0.2: ~1.5 minutes
  • OpenJDK 13.0.2 with -XX:-UseBiasedLocking: 46 seconds
  • OpenJDK 13.0.2 with -XX:-ThreadLocalHandshakes: 40 seconds
  • OpenJDK 14: 5-6 minutes
  • OpenJDK 14 with -XX:-UseBiasedLocking: 3-3,5 minutes
  • OpenJDK 15 EA Build 20: ~4,5 minutes

Only the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes is not available in Java 14.)

We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50.

Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:

$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
  30710 0
  44012 1
  55461 2
  55974 3
  27182 4
  41292 5
  43796 6
  51889 7
  54170 8
  58850 9
  51422 10
  44378 11
  41405 12
  53589 13
  41696 14
  29526 15
   2350 16
  50228 17
  62623 18
  42684 19
  45045 20

On the other hand, OpenJDK 14 seems to have interesting quiet periods:

$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
   7726 0
   1715 5
  10744 6
   4341 11
  42792 12
  45979 13
  38783 14
  17253 21
  34747 22
   1025 28
   2079 33
   2398 39
   3016 44

So, what's happening between seconds 1-4, 7-10 and 14-20?

...
[0.350s][7248][debug][class,resolve        ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45
[0.350s][7248][debug][class,resolve        ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117
[0.350s][7248][info ][biasedlocking        ] Aligned thread 0x000000001727e010 to 0x000000001727e800
[0.350s][7248][info ][os,thread            ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS)
[0.350s][6884][info ][os,thread            ] Thread is alive (tid: 6884).
[0.350s][6884][debug][os,thread            ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k).
[0.350s][6884][debug][os,thread            ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000.
[0.350s][7248][debug][thread,smr           ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500
[0.350s][7248][debug][thread,smr           ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed.
[0.350s][2944][info ][os,thread            ] Thread is alive (tid: 2944).
[0.350s][2944][debug][os,thread            ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k).
[0.350s][2944][debug][os,thread            ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000.
[0.351s][2944][debug][class,resolve        ] java.lang.Thread java.lang.Runnable Thread.java:832
[0.351s][2944][debug][class,resolve        ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135
[0.351s][2944][debug][class,resolve        ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138
[0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78
[0.351s][2944][debug][vmthread               ] Adding VM operation: HandshakeOneThread
[0.351s][6708][debug][vmthread               ] Evaluating non-safepoint VM operation: HandshakeOneThread
[0.351s][6708][debug][vmoperation            ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800

# no log until 5.723s

[5.723s][7248][info ][biasedlocking          ]   Revoked bias of currently-unlocked object
[5.723s][7248][debug][handshake,task         ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns
[5.723s][7248][debug][class,resolve          ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715
[5.723s][7248][debug][class,resolve          ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220
[5.723s][7248][debug][class,resolve          ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716
...

The second pause a little bit later:

...
[6.246s][7248][info ][class,load              ] java.awt.Graphics source: jrt:/java.desktop
[6.246s][7248][debug][class,load              ]  klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f
[6.246s][7248][debug][class,resolve           ] java.awt.Graphics java.lang.Object (super)
[6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object
[6.246s][7248][debug][jit,compilation         ]   19       4       java.lang.Object::<init> (1 bytes)   made not entrant
[6.246s][7248][debug][vmthread                ] Adding VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmthread                ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmoperation             ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns

# no log until 11.783s

[11.783s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns
[11.783s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[11.783s][6708][debug][vmoperation             ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[11.783s][7248][debug][protectiondomain        ] Checking package access
[11.783s][7248][debug][protectiondomain        ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics'
[11.783s][7248][debug][protectiondomain        ] granted
[11.783s][7248][debug][class,resolve           ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection)
[11.783s][7248][debug][class,resolve           ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300
[11.783s][7248][debug][class,preorder          ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
...

Then the third one:

...
[14.578s][7248][debug][class,preorder          ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[14.578s][7248][info ][class,load              ] java.lang.InheritableThreadLocal source: jrt:/java.base
[14.578s][7248][debug][class,load              ]  klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55
[14.578s][7248][debug][class,resolve           ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super)
[14.578s][7248][debug][jit,compilation         ]  699       3       java.lang.ThreadLocal::get (38 bytes)   made not entrant
[14.578s][7248][debug][vmthread                ] Adding VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmthread                ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmoperation             ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns
[14.579s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns

# no log until 21.455s

[21.455s][6708][debug][handshake,task          ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns
[21.455s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
[21.455s][6708][debug][vmoperation             ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[21.455s][7248][debug][class,resolve           ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39
[21.455s][7248][info ][class,init              ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740)
[21.455s][7248][debug][class,resolve           ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57
[21.456s][7248][debug][class,preorder          ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[21.456s][7248][info ][class,load              ] sun.security.jca.ProviderList source: jrt:/java.base
[21.456s][7248][debug][class,load              ]  klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2
[21.456s][7248][debug][class,resolve           ] sun.security.jca.ProviderList java.lang.Object (super)
...

The following two lines seems interesting:

[11.783s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[21.455s][6708][info ][handshake               ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns

Is that normal that these handshakes took 5.5 and 6.8 seconds?

I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:

Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \
    -jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml

What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?

Is that a JDK bug, should I report it?

update 2020-04-25:

As far as I see the log files also contains GC logs. These are the first GC logs:

$ cat app.txt.00 | grep "\[gc"
[0.016s][7248][debug][gc,heap          ] Minimum heap 8388608  Initial heap 60817408  Maximum heap 1073741824
[0.017s][7248][info ][gc,heap,coops    ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit
[0.018s][7248][info ][gc               ] Using Serial
[22.863s][6708][info ][gc,start                ] GC(0) Pause Young (Allocation Failure)
[22.863s][6708][debug][gc,heap                 ] GC(0) Heap before GC invocations=0 (full 0): def new generation   total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000)
...

Unfortunately it does not seem related since it starts after the third pause.

update 2020-04-26:

With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?

palacsint
  • 28,416
  • 10
  • 82
  • 109
  • 3
    Does `-Xlog:all=debug` turn on GC logging? That would my first guess for any pauses. – kichik Apr 24 '20 at 22:08
  • Did you try running with a profiler and compare results? I think that would be the natural thing to do. – Axel Apr 25 '20 at 07:07
  • @kichik: Good question, thanks! Unfortunately it turns on GC logs, please check the update. – palacsint Apr 25 '20 at 07:34
  • Typical Java problem, why you want to use the new version in the first place? you can't count on these GC enabled languages to keep with a steady performance. – Yan.F Apr 26 '20 at 07:30
  • Such a black box. Is the particular to your environment or your application. Can you try your application on a non-single threaded system or outside of the VM? What about the OpenJ9 jdk? Did you try building your application with the new jdk? – matt Apr 26 '20 at 08:02
  • Could this be that the latter version loads a larger dataset into memory and saturates the heap leading to a pause requiring a full gc and frequently? – Khanna111 Apr 26 '20 at 08:35
  • Pay for the Java License and get commercial support from e.g.: Oracle, Azul, etc. – JohannesB Apr 26 '20 at 08:53
  • to isolate it, try changing the gc algo, allocate more than one core, allocate more heap. Allocate more mem to the Win 10 VB. Hopefully that will assist in isolating it and give something more to go on. – Khanna111 Apr 26 '20 at 19:52
  • 1
    also check out the windows system messages, try a different build for jdk 14. If all else fails, escalate as an issue? – Khanna111 Apr 26 '20 at 19:54
  • attempt a thread dump every few units of time and analyze that to see what is going on. On linux, you could also check the light weight processes (threads) associated with a process and their states. Something similar on windows? – Khanna111 Apr 26 '20 at 20:02
  • @Axel: Could you suggest a profiler which could handle this situation? The VirtualBox machine is painfully slow when the JVM runs, it would take a lot of time to try them with no previous knowledge. I would be grateful if someone has something at hand and they could check the mentioned update4j with a single-CPU Windows 10 OpenJDK 14 setup. – palacsint Apr 26 '20 at 20:14
  • 1
    @Yan.F: OpenJDK 11 will not be supported forever, it's time to prepare the new releases and bugs. Furthermore, it seems a JDK bug - which might be fixed or not but could help others too. Anyway, for me it's mostly curiosity. On the other hand I'd like to now what to tell our customers as minimal system requirements of our app. – palacsint Apr 26 '20 at 20:19
  • @matt: It works well with 2-CPU VirtualBox machines. Switching to another JDK vendor seems a lots of work for our organization so I prefer to avoid that. The mentioned update4j demo app also has the issue and if I'm right it was built with JDK 13.0.1. – palacsint Apr 26 '20 at 20:23
  • @palacsint part of my reason for asking about the openj9 jdk is because it is different. It could help you narrow down the issue. I would suggest filing an update4j issue/bug. – matt Apr 27 '20 at 16:08
  • @palacsint: were you successful in figuring this out? – Khanna111 May 02 '20 at 18:53
  • 1
    @Khanna111: Yes, I've just written it as an answer. – palacsint May 02 '20 at 23:09

5 Answers5

6

TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).

I did not except that but I could reproduce the issue with a simple hello world:

public class Main {
    public static void main(String[] args) {
        System.out.println("Hello world");
    }
}

I have used these two batch files:

main-1cpu.bat, which limits the java process to only one CPU:

c:\windows\system32\cmd.exe /C start /affinity 1 \
    \jdk-14\bin\java \
    -Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \
    Main

main-full.bat, the java process can use both CPUs:

c:\windows\system32\cmd.exe /C start /affinity FF \
    \jdk-14\bin\java \
    -Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \
    Main

(The differences are the affinity value and name of the log file. I've wrapped it for easier reading but wrapping with \ probably doesn't work on Windows.)

A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):

PS Z:\main> Measure-Command { .\main-1cpu.bat }

...    
TotalSeconds      : 7.0203455
...


PS Z:\main> Measure-Command { .\main-full.bat }

...
TotalSeconds      : 1.5751352
...


PS Z:\main> Measure-Command { .\main-full.bat }

...
TotalSeconds      : 1.5585384
...


PS Z:\main> Measure-Command { .\main-1cpu.bat }

...
TotalSeconds      : 23.6482685
...

The produced tracelogs contain similar pauses that you can see in the question.

Running Main without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.

I've sent this findings to the hotspot-dev@openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too. Another thread about the regression on hotspot-runtime-dev@. JIRA issue for the fix: JDK-8244340

palacsint
  • 28,416
  • 10
  • 82
  • 109
5

From my experience performance problems with JDKs are related mostly to one of the following:

  • JIT Compilation
  • VM configuration (heap sizes)
  • GC algorithm
  • Changes in the JVM/JDK which break known good running applications
  • (Oh, and I forgot to mention class loading...)

If you just use the default JVM configuration since OpenJDK11, maybe you should set some of the more prominent options to fixed values, like GC, Heap size, etc.

Maybe some graphical analyse tool could help track your issue down. Like Retrace, AppDynamics or FlightRecorder and the like. These give more overview on the overall state of heap, GC cycles, RAM, threads, CPU load and so on at a given time than log files could provide.

Do I understand correctly that your application writes about 30710 lines to the log within the first second of running (under OpenJDK11)? Why is it "only" writing about 7k lines under OpenJDK14 in the first second? This seems like a huge difference for an application that is just started on different JVMs to me... Are you sure there are not for example high amounts of Exception stacktraces dumped into the log?

The other numbers are even higher sometimes, so maybe the slowdowns are related to exception logging? Or even swapping, if RAM gets low?

Actually I am thinking, if an application does not write anything into the log, this is a sign of smooth running without problems (unless it is frozen entirely in this time). What is happening from seconds 12-22 (in the OpenJDK14 case here) is what would concern me more... the logged lines go through the roof... why?

And afterwards the logging goes down to all time low values of about 1-2k lines... what is the reason for that?? (Well, maybe it is the GC kicking in at second 22 and does a tabula rasa which resolves some things...?)

Another thing may be your statement about "single CPU" machines. Does this imply "single core" also (Idk, maybe your software is tailored on legacy hardware or something)?

And the "single CPU" VMs are running on those machines? But I assume, I am wrong about these assumptions, since almost all CPUs are multicore nowadays... but I would investigate on a multithreading issue (deadlock) problem maybe.

Lii
  • 11,553
  • 8
  • 64
  • 88
Antares
  • 605
  • 3
  • 14
  • 2
    Please don't use signatures or taglines in your posts, the repetitive "GL and HF" is considered noise and a distraction from the content of your post here. See https://meta.stackexchange.com/help/behavior for more information. – user229044 Apr 26 '20 at 16:43
  • Alright, I wondered something like that myself. I'll stop the noise :D Thanks for the cleanup! – Antares Apr 26 '20 at 17:47
  • 1
    "Do I understand correctly that your application writes about 30710 lines to the log within the first second of running (under OpenJDK11)?" - Yes, you're right. – palacsint Apr 26 '20 at 19:37
  • 1
    "Are you sure there are not for example high amounts of Exception stacktraces dumped into the log?" - The log is clean, I have not found anything strange there, the application works correctly (except that it runs very-very slowly). – palacsint Apr 26 '20 at 19:39
  • 1
    GC is kicking in the 22nd second and the app remains slow after that. I've also updated the question. Please note that the update4j demo app also has the same issue. Thank you for the answer! – palacsint Apr 26 '20 at 19:58
  • 1
    30k+ log lines in one second is quite huge... don`t you agree? I really wonder what could be useful to be logged to accept this high amount of log lines in such a short time... Did you try turning logging completely off and profile the application in this mode? (I am curious, but maybe logging really has no impact as you imply with the update4j behavior) – Antares Apr 26 '20 at 20:20
  • 1
    Currently I agree with Matt Timmermans. Seems to be an interlock or resource bottleneck while handling multiple threads. The OS seems to schedule the wrong way or the threads can`t finish on a single CPU (which is single core, I suspect) because they are waiting on each other. You should try a visual JVM monitoring tool. – Antares Apr 26 '20 at 20:37
  • Sorry, I misunderstood the question. 30k lines are OpenJDK debug logs. The app logs only 100 lines or so. Measurements at the beginning of the question were made with disabled OpenJDK logs. – palacsint May 02 '20 at 23:03
4

Since it's using 100% CPU "most of the time", and it takes 10 times longer (!) with Java 14, it means that you're wasting 90% of your CPU in Java 14.

Running out of heap space can do that, as you spend a whole lot of time in GC, but you seem to have ruled that out.

I notice that you're tweaking the biased locking option, and that it makes a significant difference. That tells me that maybe your program does a lot of concurrent work in multiple threads. It's possible that your program has a concurrency bug that shows up in Java 14, but not in Java 10. That could also explain why adding another CPU makes it more than twice as fast.

Concurrency bugs often only show up when you're unlucky, and the trigger could really have been anything, like a change to hashmap organization, etc.

First, if it's feasible, check for any loops that might be busy-waiting instead of sleeping.

Then, run a profiler in sampling mode (jvisualvm will do) and look for methods that are taking a much larger % of total time than they should. Since your performance is off by a factor of 10, any problems in there should really jump out.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Biased Locking was necessary in the past but nowadays not so much, and it is proposed to be disabled by default and later removed: https://openjdk.java.net/jeps/374 – JohannesB May 02 '20 at 10:27
2

This is an interesting issue and it would require indeterminate amount of effort to narrow it down since there are many permutations and combinations that need to be tried out and data collected and collated.

Seems as of there has been no resolution to this for some time. Perhaps this might need to be escalated.

EDIT 2: Since "ThreadLocalHandshakes" is deprecated and we can assume that locking is contended, suggest trying without "UseBiasedLocking" to hopefully speed up this scenario.

However there are some suggestions to collect more data and attempt to isolate the issue.

  1. Allocate more than one core [I see that you have tried it and the issue goes away. Seems to be an issue with a thread/s execution precluding others. See no 7 below)
  2. Allocate more heap (perhaps the demands of v14 is more than that of earlier jdks)
  3. Allocate more memory to the Win 10 VB.
  4. Check the OS system messages (Win 10 in your case)
  5. Run it in an non-virtualized Win 10.
  6. Try a different build of jdk 14
  7. Do a thread dump every (or profile)few intervals of time. Analyze what thread is running exclusively. Perhaps there is a setting for equitable time sharing. Perhaps there is a higher priority thread running. What is that thread and what is it doing? In linux you could stat the lightweight processes (threads) associated with a process and its state in realtime. Something similar on Win 10?
  8. CPU usage? 100% or less? Constrained by CPU or mem? 100% CPU in service threads? Which service thread?
  9. Have you explicitly set a GC algo?

I have personally witnessed issues within versions that have to do with GC, heap resizing, issues with virtualized containers and so on.

There is no easy answer to that, I think, especially since this question has been around for some time. But we can try, all the best and let us know what is the result of some of these isolation steps.

EDIT 1: from the updated question, it seems to be related to a GC or another service thread taking over the single core non-equitably (Thread-Local Handshakes)?

Khanna111
  • 3,627
  • 1
  • 23
  • 25
  • Adding an extra CPU core used to trigger the switch from Java ergonomics on 32 bit systems from a client to a server class vm with different GC and Tiered Compilation if that is still the case it may explain sudden differences in performance and memory use, yes JVM performance is complicated – JohannesB Apr 26 '20 at 21:48
  • 3
    The Java ergonomics (default settings) are still different for 1 CPU (e.g.: -XX:+UseSerialGC) or 2 CPU's (e.g.: G1GC, LoopStripMiningIter=1000, ...ShortLoop=100) But after making sure with -XX:+PrintFlagsFinal that I tweaked all parameters to the same or similar running update4j was still extremely slow to start with just one instead of 2 CPU's with cmd.exe /C start /affinity 0x1 (but extremely fast with 0x3 - thus using 2 cpus (1 + 10 binary)). I confirmed we cannot blame any garbage collector by using Epsilon GC which is designed to avoid any GC overhead. TieredCompilation however is enabled – JohannesB Apr 27 '20 at 20:45
  • I see. With Epsilon GC, it seems as if it was as slow. In this case, thread statuses and dumps to evaluate where it gets stuck could be a way. Both in Java world and OS world (linux if I remember was gcore) – Khanna111 Apr 27 '20 at 21:56
1

Be careful with logging to slow disks, it will slow down your application:

https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic

But it doesn't seem likely to be the cause of the issue as the CPU is still busy and you don't have to wait for all threads to come to a safe point thanks to thread-local handshake: https://openjdk.java.net/jeps/312

Also not directly related to the problem you have but more generally if you want to try to squeeze more performance out of your hardware for startup time, take a look at AppCDS (class data sharing):

https://blog.codefx.org/java/application-class-data-sharing/

JohannesB
  • 2,214
  • 1
  • 11
  • 18