5

I was very interested in upgrading to Java 7 (for my own selfish coding reasons). However, we have users who are very latency sensitive (everything needs to be sub-millisecond). I did a simple performance comparison test between 3 different JVMs and found Java 7 to be so much slower. The test pushed some simple messages through our application. This test is a low load, load volume test, which pushes a single message through, every few seconds. The results were (in microseconds):

 - Hotspot 6 (build 24): msgs= 23 avg= 902 
 - JRockit 6 (R28 b 29): msgs= 23 avg= 481 
 - Hotspot 7 (build 04): msgs= 34 avg=1130

Oracle's strategy is to merge JRockit and Hotspot starting with Java 7 (so JRockit 6 is the last available). Does anyone have any ideas why the performance is so much worse? (One point to note, is that the code was compiled under Java 1.6. Not sure if that would explain it...)

UPDATE: I voted to close my own question because I can see from the comments that I am not really able to communicate enough info to make this a constructive question. Thanks to all who commented.

UPDATE: After more feed back, I thought I would provide more info. Test is always after a fresh start. All factors are equal for each test. The only thing which changes is JVM. Repeating test multiple times gives consistent result. No GCs occurred in any test iteration.

Below is graphed values of one of the test runs. For both JRockit and Hotspot 7, the very first latency value was thrown out. JRockit has huge first value, but then very quickly optimizes and settles toward mean. Hotspot 7 takes longer to optimize, and never drops to a mean as low as JRockit. Each data point represents microseconds to read a message from TCP/IP socket, run through business logic, and write message on another socket. Every message is identical, and no new code paths are entered for any message.

JRockit 6 vs. Hotspot 7

Sam Goldberg
  • 6,711
  • 8
  • 52
  • 85
  • 3
    Were there any JVM flags like "-server", etc? Java 7 is unlikely to be _that_ slower. Are you also sure you sticked to all [the rules of JVM microbenchmarking](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java)? – Petr Janeček Jun 05 '12 at 19:55
  • 2
    Has there been any GCs during your tests? I believe Java 7 uses a different algo by default. Does JIT come at the same time? Were the JVMs warmed up in the same way? etc. – assylias Jun 05 '12 at 19:56
  • 1
    mean is not a good measure for this kind of work, you need a range of values (mean, median, 90/95/99 percentile, max) so you can really see what is going on. ~20-30 values is not much to work with either. Therefore I'd say it looks suspicious, definitely needs investigation (if you have a submilli user community) but your approach needs some work first. – Matt Jun 05 '12 at 20:01
  • I've also seen significant variation in latency across different collectors on the same vm, from memory one (application) benchmark showed typical latency (with no significant tail latency) varying between ~300us (parnew) to ~350us (throughput with usenuma) to ~450us (throughout without usenuma) to ~500us (g1) – Matt Jun 05 '12 at 20:06
  • @Slanec: We used same JVM args for all 3 tests. `-Xms40768M -Xmx40768M`. We have some other args, but they are pretty obscure: `-Dsun.rmi.dgc.server.gcInterval=54000000 -Dsun.rmi.dgc.client.gcinterval=54000000` – Sam Goldberg Jun 05 '12 at 20:09
  • @assylias: warm-up phase was excluded. We do this kind of benchmarking *all the time*, so unless there is something *very* different about Hotspot 7, it should perform the same as the other 2. (Clearly there is something different.) – Sam Goldberg Jun 05 '12 at 20:14
  • @Matt: Do you suppose that more data points would move Java 7 closer? I'm pretty sure not, at least for this test. I watched the latency log for all 3 tests (and also repeated the test a few times). Java 6 and JRockit initially showed slow as classes were loaded, but optimized quickly (as we always see in production). Java 7 never optimized. If it would take more messages to move closer to the other 2 JVMs, that already makes it a loser for our application. – Sam Goldberg Jun 05 '12 at 20:18
  • @Sam The thing about mediums and Java is simple, assume in your benchmark falls a old gen GC, that certainly means you can something like: `[10, 9, 11, 2000, 12]` which will completely ruin the average. While it need not be the case, it's generally a good idea to avoid such problems by at least computing the standard deviation as well. That said it's certainly imaginable that JRockit is better than hotspot in some things, couldn't say. – Voo Jun 05 '12 at 20:21
  • "The test pushed some simple messages through our application." So JR + J6 pushes 23 msgs and J7 34 (+50%) ? – PeterMmm Jun 05 '12 at 20:24
  • 1
    @SamGoldberg JRockit is not merged in Java7. Java7 is basically Java6 with a few new features and a few improvements. Some of them might be good for you, others not. As far as I have heard from the two teams, Oracle is primarily trying to bring the instrumentation features of JRockit into hotspot (going forward) so I wouldn't expect the Oracle JVM to be "as fast as JRockit" just because they merge them. As other has pointed out, you need to provide more info to get good guesses. You tell us nothing about your code, warmup or even how you get your timestamps. – Fredrik Jun 05 '12 at 20:27
  • @Fredrik: "You tell us nothing about your code, warmup or even how you get your timestamps". If it's the same code, different JVM, are you hinting that the code needs to be tuned to the Java 7 JVM? Time stamps are done using System.nanoTime difference from start to end of process. (Same as for all 3 tests). – Sam Goldberg Jun 05 '12 at 20:31
  • @SamGoldberg warm-up works different, different JVMs optimize different code in different ways AND there is a rather big risk that it is actually not the JVM itself which is the difference but something you do. In order to get good advice you need to provide more info. Right now you are doing the equivalent of reporting a bug with "it doesn't work" and nothing more. – Fredrik Jun 05 '12 at 20:55
  • @Fredrik: I agree - and voted to close this question. – Sam Goldberg Jun 05 '12 at 20:58
  • 1
    @SamGoldberg Instead of having the question closed, why don't you post a sample benchmark which gives you the figures? It would be rather interesting for most of us to find out where the problem is. – Fredrik Jun 05 '12 at 20:58
  • @Fredrik: Thanks, I am trying to think of a more constructive way to provide information. Unfortunately, there is no code snippet I can post - because it is a end-to-end test of processing a message through the application. I was really just trying to compare the same test sequence against 3 different JVMs, and was trying to see if anyone else had observed similar differences. – Sam Goldberg Jun 05 '12 at 21:09
  • @SamGoldberg Did you do proper warm-up and made sure what you were comparing was compiled code and not interpreted (or even interpreted vs compiled)? Further discussions should probably be made in a chat or something but I don't have more time for this tonight. Good luck. – Fredrik Jun 05 '12 at 21:14
  • @Sam Goldberg Try discarding values too high, like Voo said. – SHiRKiT Jun 05 '12 at 21:26
  • @SHiRKiT: I did that. Same methodology followed on all 3 VMs. I'm not worried that some testing artifact is causing a difference. Repeated test iterations produce the same results for all 3 (with minimal variation). – Sam Goldberg Jun 05 '12 at 21:30
  • @assylias: I confirmed there were no GCs. There is some other reason for the difference. – Sam Goldberg Jun 05 '12 at 21:32
  • If you analyze some callgraphs, It may be possible to find changes (bugfixes and optimizations ) in the JDK that perhaps penalizes your particular usage pattern. I've seen differences between implementations and platforms (Sun/IBM/OpenJdk, WIndows/Solaris/Linux) in 1.6, regarding java.util.collections and concurrent that were affecting perfomance (Eg, ArrayList is very "optimized" by IBM to speed up insert at the start of the list, unfortunately introducing some bugs, and ThreadLocal is dead slow on Linux) – KarlP Jun 05 '12 at 22:51
  • Did you try dropping the compilation threshold (`-XX:CompileThreshold=1000`) and running with `-XX:+PrintCompilations` to see what was being compiled? – TMN Sep 12 '12 at 20:38

2 Answers2

11

JRockit is its own (pure C) code base different from the OpenJDK. It contains different garbage collectors, and a totally different JIT compiler. One big monetizer when it was owned by BEA was low latency GC, which is quite advanced, even in the non commercial variants. A significant amount of time has been spent on JRockit as a clean room vm implementation. As has been said in the comments, it's not as much a matter of merging as reimplementing stuff in the HotSpot code base. This is far from a fast process and some of the things will not get there at all, at least not in their in their JRockit form. Puzzle pieces do not readily fit without some filing at the edges, so to speak. JDK7 Hotspot, will be good at other things, or different versions of the similar systems, however that might make up for some of your lost performance. Other applications may well run faster than with JRockit 6.

If you are interested in learning more about the JRockit (or any JVM) internals, the book "Oracle JRockit the definitive guide" is a highly recommended read. Full disclosure, I probably get ~$2 before taxes in royalty for each copy and will use it to buy espresso. :)

Marcus
  • 141
  • 7
  • I was taking my information on Hotspot7 from oracle blog post from 2010 which said: ["The overwhelming majority of all JVM work we do will go into OpenJDK (this includes all performance features from JRockit)"](https://blogs.oracle.com/henrik/entry/oracles_jvm_strategy). Reading further makes it sound like integrating JRockit into OpenJDK is *the* strategy. – Sam Goldberg Jun 13 '12 at 15:16
5

The main thrust of this question was, all other things being equal (including the JVM args) why does the same JAR of Java code run so much more slowly with Hotspot 7 JVM than with JRockit 6 and Hotspot 6.

This gave rise to a few responses concerned about whether the timing was done correctly (apparently due to people's skepticism that there could really have such a different result between the JVMs). Based on numerous tests, there is no question in my mind that the measurements are correct.

Potential answers I thought possible were:

  • Java 7 JVM does not run code compiled under Java 6 as fast as the same code compiled under Java 7
  • new JVM args are required for Java 7 to run in most optimized mode possible
  • Other people have benchmarked Java 7 against JRockit 6 and seen same result as I did

So the fact is, the new Java 7 JVM behavior is very different with our app, all other things being equal. The only resolution is to profile the code against the Java 7 VM, to discover where the slow points are in the code. (And perhaps at that point, it will be clear what the actual difference between Java 6 JVM and Java 7 JVM was/is).

I appreciate everyone's comments, and apologize that I couldn't provide enough detail for a clear analysis/resolution.

Sam Goldberg
  • 6,711
  • 8
  • 52
  • 85
  • As I recall the main performance release for Java 7 was r41 onwards. Most of the changes in Java 7 releases were about performance in one way or other so it would be interesting to see if things have reversed with the later versions. – Chaffers Apr 27 '15 at 11:20