Tuning Java 7 to match performance of Java 6

Question

We have a simple unit test as part of our performance test suite which we use to verify that the base system is sane and performs before we even start testing our code. This way we usually verify that a machine is suitable for running actual performance tests.

When we compare Java 6 and Java 7 using this test, Java 7 takes considerably longer to execute! We see an average of 22 seconds for Java 6 and 24 seconds for Java 7. The test only computes fibonacci, so only bytecode execution in a single thread should be relevant here and not I/O or anything else.

Currently we run it with default settings on Windows with or without "-server", with both 32 and 64 bit JVM, all runs indicate a similar degradation for Java 7.

Which tuning options may be suitable here to try to match Java 7 against Java 6?

public class BaseLinePerformance {

    @Before
    public void setup() throws Exception{
        fib(46);
    }

    @Test
    public void testBaseLine() throws Exception {
        long start = System.currentTimeMillis();
        fib(46);
        fib(46);
        System.out.println("Time: " + (System.currentTimeMillis() - start));
    }

    public static void fib(final int n) throws Exception {
        for (int i = 0; i < n; i++) {
            System.out.println("fib(" + i + ") = " + fib2(i));
        }
    }

    public static int fib2(final int n) {
        if (n == 0)
            return 0;
        else if (n == 1)
            return 1;
        else
            return fib2(n - 2) + fib2(n - 1);
    }
}

Update: I have reduced the test to not do any sleeps and followed the other suggestions from How do I write a correct micro-benchmark in Java?, I still see the same difference between Java 7 and Java 6, additional JVM options to print compilation and GC do not show any output during the actual test, only initially compilation information is printed.

Do you really need those 2 seconds on a meaningless test? That has no bearing on the performance of your real system. Further, either you have a performance problem you need to address or you don't. All you know now is Java 7 runs fibbonacci marginally slower on your test machine. Who cares about that? — Eric Stein, Aug 23 '13 at 19:37
What happens if you take all those sleeps out? this "considerably longer" runtime might just be thread scheduling differences — dkatzel, Aug 23 '13 at 19:40
Your performance test is not robust so it is not very surprising that you could see such differences across JVMs. For example, you don't account for factors such as class loading and JIT compilation. Read for example: http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java — assylias, Aug 23 '13 at 19:46
Ad. Eric Stein: if I switch to include Java 7 instead of 6 as part of our product and my performance critical application runs more than 10% slower. Some of our customers will care about this so I have to care about it as well. — centic, Aug 24 '13 at 18:47
Ad Thorbjørn Ravn Andersen: We measure without the warmup in setUp(), so some warmup is already done, but thanks for the tip, we will test with a lot more to rule out different time until full optimization kicks in. — centic, Aug 24 '13 at 18:48
Ad assylias: The test ran stable for over two years on Java 6 and had a noticeable and constant increase of more than 10% at exactly at the time when we switched to Java 7, so I have to conclude that Java 7 is slower or at least does something fundamentally different. Thanks for the link to the benchmark-question, we will take any advice from there and re-run to verify if it makes a difference. — centic, Aug 24 '13 at 18:52
@centric: Yes, if your real app runs 10% slower you have a problem. But you don't know that your fib test is representative of your real app. But presuming you do consider it representative, run it thru a profiler. Where is the time going. My guess is that sleep is taking longer and Java 7 is handling thread scheduling differently. — Jeanne Boyarsky, Aug 25 '13 at 23:36
@Jeanne Boyarsky: I have updated the test to not sleep at all and verified that GC/JIT compilation is not causing this. Some of our actual performance tests also run slower, but this test with fib is the narrowed down test to verify that there is a difference in the JVM itself. I cannot explain why the simple test is so much slower, so for me it is not useful to look at a more complicated test with many moving parts unless the simple test does what I expect to or at least I am able to explain why it does take longer to run. — centic, Aug 26 '13 at 08:35
@centic the fib2 method is so small that even a single changed instruction can easily result in a 10% difference. If you really want to know where the 10% come from, you'll have to look at the JIT generated machine code. (And probable read Agner Fog's manuals, ...) — Chris, Aug 26 '13 at 11:09
@centic Since your method is fairly short, you could have a look at [the generated assembly](http://stackoverflow.com/a/15146962/829571), and compare it line by line (most lines should be identical anyway). — assylias, Aug 26 '13 at 21:29

score 5 · Accepted Answer · answered Sep 10 '13 at 14:52

One of my colleagues found out the reason for this after a bit more digging:

There is a JVM flag -XX:MaxRecursiveInlineLevel which has a default value of 1. It seems the handling of this setting was slightly incorrect in previous versions, so Sun/Oracle "fixed" this in Java 7, however it has the side-effect that sometimes the inlining now is done less aggressively and thus pure runtime/CPU time of recursive code can be longer than before.

We are testing setting it to 2 to get the same behavior as in Java 6 at least for the test in question.

score 0 · Answer 2 · answered Sep 04 '13 at 05:40

This is not an easy answer, there are plenty of things that can account for those 2 seconds.

I am assuming for your comments that you are already familiar with micro benchmarking and that your benchmark is being run after warming up the JVM having your code reach an optimized JIT state and no GCs happening, also assuming that your hardware setup has not changed.

I would recommend CPU profiling your benchmark, that will help you identify where those two seconds are being accounted and perhaps act accordingly.

If you are curious about the bytecode you can take a peek at it.

To do this you can compile your class and do javap -c ClassName on both machines, this will disassemble the class file bytecode and show it to you, here you will surely see changes between both compiled classes.

In conclusion, profile and tune your application accordingly to reach 22 seconds after looking at the data, there is nothing you can do anyways about the bytecode implementation.

Thanks for summing up, in my case javap does not help because I run the same bytecode with two different JVMs and already see the difference. So the difference is in the optimization that the JIT Compiler does. I actually looked at the resulting assembly code that is generated and this showed considerable differences and thus the optimization of the JVM changed and has a negative effect on this specific case — centic, Sep 04 '13 at 11:34

Tuning Java 7 to match performance of Java 6

2 Answers2