49

I am supporting a Java messaging application that requires low latency (< 300 microseconds processing each message). However, our profiling shows that the Sun Java Virtual Machine runs slowly at first, and speeds up after the first 5,000 messages or so. The first 5,000 messages have latency of 1-4 milliseconds. After about the first 5,000, subsequent messages have ~250 microseconds latency, with occasional outliers.

It's generally understood that this is typical behavior for a Java application. However, from a business standpoint it's not acceptable to tell the customer that they have to wait for the JVM to "warm-up" before they see the performance they demand. The application needs to be "warmed-up" before the first customer message is processed

The JVM is Sun 1.6.0 update 4.

Ideas for overcoming this issue:

  1. JVM settings, such as -XX:CompileThreshold=
  2. Add a component to "warm-up" the application on startup, for example by sending "fake messages" through the application.
  3. Statically load application and JDK classes upon application startup, so that classes aren't loaded from JARs while processing customer messages.
  4. Some utility or Java agent that accomplishes either or both of the above two ideas, so that I don't have to re-invent the wheel.

NOTE: Obviously for this solution I'm looking at all factors, including chip arch, disk type and configuration and OS settings. However, for this question I want to focus on what can be done to optimize the Java application and minimize "warm up" time.

noahlz
  • 10,202
  • 7
  • 56
  • 75
  • I think you'd better look at the underlying cause of this initial delay. Profiling tools might help. – Thomas Sep 26 '09 at 18:34
  • 2
    Send 5000 fake messages to the server as part of the installation and start-up procedure. – Zed Sep 26 '09 at 18:39
  • 1
    5000 fake messages (even if it were a good idea) sounds like it would add 5 to 20 seconds to the app's startup time. – Robert Harvey Sep 26 '09 at 18:43
  • I like idea 3, if that's what is really causing the latency. – Robert Harvey Sep 26 '09 at 18:45
  • Can you explain the normal characteristics of this applicatioon? - i.e. is it expected that the app will process more than ~5000 messages each time it starts up? Or does the app need to start and stop often? – Gary Sep 26 '09 at 18:49
  • 1
    Have you tried using an alternative JVM to the Sun JRE? I've seen 20-30% speed-ups with a quicker warm-up time using BEA JRockit on Windows machines. – Trevor Tippins Sep 26 '09 at 18:52
  • Also, can you list all of the JVM options you are using today? I like the idea of a using a Profiler, but I suspect the results it gives might help you tune the performance characteristics of the code you write, but not so much for optimizing the effects of hotspot or potential classloading warm-up. – Gary Sep 26 '09 at 18:54
  • 1
    For those finding this post via search engines, application warm-up is listed as a technique in the following very good white paper on high performance Java: http://www.cinnober.com/news/benefits-using-java-highperformance-language – noahlz Oct 17 '12 at 17:42
  • Previous URL 404. New URL : https://www.cinnober.com/white-papers/benefits-using-java-highperformancelanguage – BenC Apr 07 '17 at 14:29
  • both urls link to nasdaq – experiment unit 1998X May 22 '23 at 02:45
  • Sorry, that can happen after 10+ years. The white papers are likely very very out of date anyway! – noahlz May 30 '23 at 17:51

7 Answers7

37

"Warm-up" in Java is generally about two things:

(1): Lazy class loading: This can be work around by force it to load.

The easy way to do that is to send a fake message. You should be sure that the fake message will trigger all access to classes. For exmaple, if you send an empty message but your progrom will check if the message is empty and avoid doing certain things, then this will not work.

Another way to do it is to force class initialization by accessing that class when you program starts.

(2): The realtime optimization: At run time, Java VM will optimize some part of the code. This is the major reason why there is a warm-up time at all.

To ease this, you can sent bunch of fake (but look real) messages so that the optimization can finish before your user use it.

Another that you can help to ease this is to support inline such as using private and final as much as you can. the reason is that, the VM does not need to look up the inheritance table to see what method to actually be called.

Hope this helps.

NawaMan
  • 25,129
  • 10
  • 51
  • 77
14

Your problem is not class loading but "just in time" compilation.

Try -XX:CompileThreshold=1

That will force Java to compile everything the first time it runs it. It will slow down the startup of your code somewhat but not VM code (since that gets compiled when Java is installed). There is a bug open to allow Java to compile custom JARs in a similar way and save the result for later executions which would greatly reduce this overhead but there is no pressure to fix this bug any time soon.

A second option would be to send 5'000 fake messages to the app to "warm it up". Sell this as "making sure everything is set up correctly".

[EDIT] Some background info in precompiling classes: Class Data Sharing

You may want to try IBM's version of Java since here, you can add more classes to the shared pool: Overview of class data sharing

[EDIT2] To answer concerns raised by kittylyst: It's true that this will quickly fill up your code cache with methods that are used only once. And it might even make your whole app slower.

If you set it to a low value, the startup time of your application can become horribly slow. This is because the JIT optimization + running the compiled code is more expensive than running the code once in interpreted mode.

The main problem here is that the code is still compiled "just in time". As long as you can't run every method that you need at least once, the app will "hickup" for a few milliseconds every time it encounters something that hasn't been compiled before.

But if you have the RAM, your app is small or you can increase the size of the code cache and you don't mind the slow startup time, you can give this a try. Generally, the default setting is pretty good.

Community
  • 1
  • 1
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • Suggestion: Edit to add link to actual bugreport. – Thorbjørn Ravn Andersen Sep 27 '09 at 08:08
  • Where is that bug? I had this discussion with a Sun compiler guy and he convinced me this would be "very difficult" and much more complicated that first blush would seem to indicate. – Mainguy Sep 27 '09 at 14:01
  • I was pretty sure I've opened a bug but maybe I just discussed this with a Java dev. Anyway, I've added a couple of links which might help to understand the issue and solve it. – Aaron Digulla Sep 28 '09 at 07:37
  • 11
    Checking back on this question months later... I'm sorry to report that CompileThreshold does NOT improve performance and actually can HURT performance. The reason is because the HotSpot compiler does not have a chance to do runtime analysis and optimize the code effectively. – noahlz Feb 05 '10 at 05:08
  • I think the reason is more that compiling does take time (especially the optimizing), which is why it is faster to interpret the code first and only compile it when it's used repeatedly. If the Sun VM would allow to save the compilation result, it would speed up the process a lot. – Aaron Digulla Feb 08 '10 at 07:53
  • 7
    By the way, this is terrible advice, and I'm afraid warrants a downvote. The CompileThreshold should be left alone under almost all circumstances. What you've suggested here would seriously harm the performance of the application, because it would consume the entire code cache very quickly and fill it with random methods that may or may not be hot. Do NOT do this. – kittylyst Apr 29 '12 at 21:11
  • @kittylyst: If I have enough RAM and I increase the code cache size, what is the harm? Besides wasting some or even a lot of RAM. – Aaron Digulla Apr 30 '12 at 07:24
  • 2
    @Aaron The problem is that the OP clearly doesn't understand even the basics of JIT compilation, otherwise s/he wouldn't be asking questions about "why does it speed up after 5000 iterations". I'm sure *you* know about the code cache and can be trusted to tune it properly, and to make decisions about sacrificing profiling information and increasing pressure on the compile thread. However, I maintain that this is "loaded gun" advice to hand to a novice. – kittylyst May 01 '12 at 08:54
  • @kittylyst: People only improve when they are allowed to make mistakes. – Aaron Digulla May 02 '12 at 07:13
  • +1 - the flag helped a lot, I didn't know it existed. The reason why it's useful is because OSR JIT compilation and direct JIT compilation do not always yield the same results - OSR can produce slower code. – axel22 Oct 31 '12 at 16:53
  • 2
    This option is ignored when tiered compilation is enabled. Ref: http://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html – petertc May 29 '16 at 13:46
10

Just run a bunch of no-op messages through the system before it's opened for genuine customer traffic. 10k messages is the usual number.

For financial apps (e.g. FIX), this is often done by sending orders (at a price far away from last night's close, just in case) to the market before it opens. They'll all get rejected, but that doesn't matter.

If the protocol you're using is homebrew, then the next time you upgrade the libraries for it, add explicit support for a "WARMUP" or "TEST" or "SANITYCHECK" message type.

Of course, this likely won't compile up your application-logic-specific pathways, but in a decent messaging app, the part dealing with network traffic will almost certainly be the dominant part of the stack, so that doesn't matter.

kittylyst
  • 5,640
  • 2
  • 23
  • 36
  • 2
    Yes, that is the approach we took. But, we didn't spam the Market. We created fake "loopback" connections. – noahlz May 01 '12 at 12:37
  • It’s important that your code touches the actual application hot paths or it won’t warm them up. Someone can correct me if I’m wrong :) But your are right about warming up the network stack - that’s a huge part of it. – jocull Sep 02 '18 at 13:52
  • 1
    Reply largely for @jocull - this is a really old thread and IMO shows the declining utility of SO as a developer resource over time. The world has moved on a huge amount in the last 9 years, and questions like this really should be archived / deleted. – kittylyst Feb 03 '19 at 11:56
  • @kittylyst I'd love more details about how things have changed if you have them! Even just starter links will do :) – jocull Feb 05 '19 at 15:55
  • 1
    @jocull This might make the basis of a good article - thanks for the idea. There is a certain amount of information in my book "Optimizing Java" (O'Reilly) and in my Oracle Java magazine articles, but it's not specifically organized around how things have changed. – kittylyst Feb 06 '19 at 12:30
4

If running in Hotspot's server mode on contemporary hardware (with 2 or more cores per CPU) and with latest version of JDK then following option can be used for speeding of warming up:

-server -XX:+TieredCompilation
Andriy Plokhotnyuk
  • 7,883
  • 2
  • 44
  • 68
  • 1
    By default, TieredCompilation option is enabled. Ref:http://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html – petertc May 29 '16 at 13:45
3

Are you using the client or the server JVM? Try starting your program with:

java -server com.mycompany.MyProgram

When running Sun's JVM in this mode, the JIT will compile the bytecode to native code earlier; because of this, the program will take longer to start, but it will run faster after that.

Reference: Frequently Asked Questions About the Java HotSpot VM

Quote:

What's the difference between the -client and -server systems?

These two systems are different binaries. They are essentially two different compilers (JITs)interfacing to the same runtime system. The client system is optimal for applications which need fast startup times or small footprints, the server system is optimal for applications where the overall performance is most important. In general the client system is better suited for interactive applications such as GUIs. Some of the other differences include the compilation policy,heap defaults, and inlining policy.

Community
  • 1
  • 1
Jesper
  • 202,709
  • 46
  • 318
  • 350
  • It is running in server mode by default because it's running on a "server-class" machine (16 GB memory, 8 cores) – noahlz Sep 26 '09 at 19:10
  • server and client mode both does JIT compilation. The difference is in the number of an actual method's calls (3000 for client and 10000 for server mode). Also, with a 64bit JVM, the server mode is used as default. – gyorgyabraham Jul 26 '13 at 13:22
  • There is an additional difference I believe - client VM will use the C1 compiler which compiles faster, but optimizes less. This can lead to poor performance in server applications compared to server VMs which will use the C2 compiler. In tiered compilation mode we use both C1 and C2, so the lines become blurred. But you might notice that tiered mode is unavailable in client VMs. – jocull Dec 15 '19 at 04:02
2

Old thread, I know, but I found this in the internet:

A very interesting thing is the influence of option-XX:CompileThreshold=1500 to SUN HotSpot JVM. The default is 10000 for Server VM and 1500 for Client VM. However setting it to 1500 for Server VM makes it faster than Client VM. Setting it to 100 actualy lowers the performance. And using option-Xcomp (which means that all code is compiled before usage) gives even lower performance, which is surprising.

Now you know what to do.

John Conde
  • 217,595
  • 99
  • 455
  • 496
mark
  • 84
  • 1
  • 2
  • I believe the performance is lower with earlier compilation because appropriate statistics about he code haven’t or can’t be gathered. So they are far less optimized. – jocull Sep 02 '18 at 13:54
-4

Seems like your project would benefit from real time guarantees:

See: Real Time Java

  • 1
    No, absolutely not. "Real Time" is about consistency of response, not absolute value. Real Time Java typically runs significantly slower than regular Java, but with reduced variance in the performance profile. This is not typically suitable for low-latency applications. – kittylyst Apr 29 '12 at 21:13