Java efficiency

Question

I'm playing with some piece of code calculating the time needed to compute some Java code to get a feeling of the efficiency or inefficiency of some of Java's functionality. Doing so I'm stuck now with some really strange effect I just can't explain myself. Maybe someone of you can help me understand it.

public class PerformanceCheck {

 public static void main(String[] args) {
    List<PerformanceCheck> removeList = new LinkedList<PerformanceCheck>();

    int maxTimes = 1000000000;

    for (int i=0;i<10;i++) {
        long time = System.currentTimeMillis();

        for (int times=0;times<maxTimes;times++) {
            // PERFORMANCE CHECK BLOCK START

            if (removeList.size() > 0) {
                testFunc(3);
            }

            // PERFORMANCE CHECK BLOCK END
        }

        long timeNow = System.currentTimeMillis();
        System.out.println("time: " + (timeNow - time));
    }
 }

 private static boolean testFunc(int test) {
    return 5 > test;
 }

}

Starting this results in a relatively long computation time (remember removeList is empty, so testFunc is not even called):

time: 2328
time: 2223
...

While replacing anything of the combination of removeList.size() > 0 and testFunc(3) with anything else has better results. For example:

...
if (removeList.size() == 0) {
    testFunc(3);
}
...

Results in (testFunc is called every single time):

time: 8
time: 7
time: 0
time: 0

Even calling both functions independent from each other results in the lower computation time:

...
if (removeList.size() == 0);
    testFunc(3);
...

Result:

time: 6
time: 5
time: 0
time: 0
...

Only this particular combination in my initial example takes so long. This is irritating me and I'd really like to understand it. What's so special about it?

Thanks.

Addition:

Changing testFunc() in the first example

if (removeList.size() > 0) {
                testFunc(times);
}

to something else, like

private static int testFunc2(int test) {
    return 5*test;
}

Will result in being fast again.

And you ran this test more than once, and in different orders right? The order shouldn't matter, but just to make sure. Also, are you getting (about) the same results with nanoTime as suggested below? — prelic, Jan 19 '12 at 19:09
This kind of micro-benchmark is a pretty bad idea. Also realize that the JIT can make various decisions at arbitrary times and optimize your code away completely, if it realizes nothing is actually happening. — Dave Newton, Jan 19 '12 at 19:09
You should use [System.nanoTime()](http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#nanoTime%28%29) for measuring code execution in Java. It is more precise. more discussion [in this question](http://stackoverflow.com/questions/351565/system-currenttimemillis-vs-system-nanotime) — calebds, Jan 19 '12 at 19:14
Yes, I ran it multiple times and have the same results with nanoTime. For the first code: time: 2256414917 time: 2256004749. Thanks for the tip with nanoTime. Nevertheless, the first code takes longer than 2 seconds, while the other examples not even take 10 miliseconds. — x4-, Jan 19 '12 at 19:16
You should also post a little more information. For a decent test use nanoTime() and measure in microseconds. Additionally, invoking Java from IDEs is generally a bad idea (at least on windows). Use a recent version, say Java 7, and you shall invoke your JAR file from console with something like: java -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -server -jar FILE — thesaint, Jan 19 '12 at 19:16
Guys, if you execute a program and it takes 30 seconds to complete and then you start it a little different and it completes in < 0.1 second... WHERE EXACTLY do you need more *precision*? Who cares if factor is 10.000 or just 500, it's a lot of a difference. — yankee, Jan 19 '12 at 19:17
@yankee: its just by principle to use nanoTime() ^^. Maybe you could download the SDK source code and look into what size() really does... — thesaint, Jan 19 '12 at 19:22
What `size()` does doesn't matter. Both code snippets have the same call. Precision doesn't matter. These JVM flags don't matter. Running through an IDE doesn't matter. There is no GC issue here. — Sean Owen, Jan 19 '12 at 19:26
The extreme difference in times can only (?) be explained by the compiler optimizing away the call. As I suggested in my answer below. Have you tried using the results of the call so that it cannot be optimized away to compare??? — user949300, Jan 19 '12 at 20:02
All this talk about nanoTime() or milliTime() feels like arguing if you should use band aids or cloth to help a person who has been hit by an anti-tank round. — lessthanoptimal, Jan 19 '12 at 20:36

score 3 · Answer 1 · answered Jan 19 '12 at 19:16

3

That is really surprising. The generated bytecode is identical except for the conditional, which is ifle vs ifne.

The results are much more sensible if you turn off the JIT with -Xint. The second version is 2x slower. So it's to do with what the JIT optimization.

I assume that it can optimize out the check in the second case but not the first (for whatever reason). Even though it means it does the work of the function, missing that conditional makes things much faster. It avoids pipeline stalls and all that.

answered Jan 19 '12 at 19:16

Sean Owen

66,182
23
141
173

The thing is, it also depends on testFunc(). Exchange the testFunc() of the first version with anything else, for example: private static int testFunc2(int test) { return 5*test; } This will result in the lower computation time. – x4- Jan 19 '12 at 19:48

lessthanoptimal · Answer 2 · 2012-01-19T22:08:33.860

While not directly related to this question, this is how you would correctly micro benchmark the code using Caliper. Below is a modified version of your code so that it will run with Caliper. The inner loops had to be modified some so that the VM will not optimize them out. It is surprisingly smart at realizing nothing was happening.

There are also a lot of nuances when benchmarking Java code. I wrote about some of the issues I ran into at Java Matrix Benchmark, such as how past history can effect current results. You will avoid many of those issues by using Caliper.

http://code.google.com/p/caliper/

Benchmarking issues with Java Matrix Benchmark

public class PerformanceCheck extends SimpleBenchmark {

public int timeFirstCase(int reps) {
    List<PerformanceCheck> removeList = new LinkedList<PerformanceCheck>();
    removeList.add( new PerformanceCheck());
    int ret = 0;

    for( int i = 0; i < reps; i++ )  {
        if (removeList.size() > 0) {
            if( testFunc(i) )
                ret++;
        }
    }

    return ret;
}

public int timeSecondCase(int reps) {
    List<PerformanceCheck> removeList = new LinkedList<PerformanceCheck>();
    removeList.add( new PerformanceCheck());
    int ret = 0;

    for( int i = 0; i < reps; i++ )  {
        if (removeList.size() == 0) {
            if( testFunc(i) )
                ret++;
        }
    }

    return ret;
}

private static boolean testFunc(int test) {
    return 5 > test;
}

public static void main(String[] args) {
    Runner.main(PerformanceCheck.class, args);
}
}

OUTPUT:

 0% Scenario{vm=java, trial=0, benchmark=FirstCase} 0.60 ns; σ=0.00 ns @ 3 trials
50% Scenario{vm=java, trial=0, benchmark=SecondCase} 1.92 ns; σ=0.22 ns @ 10 trials

 benchmark    ns linear runtime
 FirstCase 0.598 =========
SecondCase 1.925 ==============================

vm: java
trial: 0

i don't think the question is how to microbenchmark but explaining the observed behavior. But you hit it along the way; it is JIT optimizing something in unexpected ways. — Sean Owen, Jan 19 '12 at 21:55
Yeah I agree. Thought I might point out the correct way since even after that code gets explained there would still be strange fluctuation because of how the benchmark is setup. — lessthanoptimal, Jan 19 '12 at 22:05
Thanks for the tip with Caliper. I'll use this in the future. Still the question remains why Java acts like this. In your example you switched the behaviour by adding an Object to the list. Now the if-clause of `timeSecondCase` returns false. And it's still the one version in which testFunc() isn't called which is slower. — x4-, Jan 19 '12 at 22:24

score 1 · Answer 3 · answered Jan 19 '12 at 19:59

Well, I am glad not having to deal with Java performance optimizations. I tried it myself with Java JDK 7 64-Bit. The results are arbitrary ;). It makes no difference which lists I am using or if I cache the result of size() before entering the loop. Also entirely wiping out the test function makes almost no difference (so it can't be a branch prediction hit either). Optimization flags improve performance but are as arbitrary.

The only logical consequence here is that the JIT compiler sometimes is able to optimize away the statement (which is not that hard to be true), but it seems rather arbitrary. One of the many reasons why I prefer languages like C++, where the behaviour is at least deterministic, even if it is sometimes arbitrary.

BTW in the latest Eclipse, like it always was on Windows, running this code via IDE "Run" (no debug) is 10 times slower than running it from console, so much about that...

Garrett Hall · Answer 4 · 2012-01-19T21:06:16.553

When the runtime compiler can figure out testFunc evaluates to a constant, I believe it does not evaluate the loop, which explains the speedup.

When the condition is removeList.size() == 0 the function testFunc(3) gets evaluated to a constant. When the condition is removeList.size() != 0 the inner code never gets evaluated so it can't be sped-up. You can modify your code as follows:

for (int times = 0; times < maxTimes; times++) {
            testFunc();  // Removing this call makes the code slow again!
            if (removeList.size() != 0) {
                testFunc();
            }
        }

private static boolean testFunc() {
    return testFunc(3);
}

When testFunc() is not initially called, the runtime compiler does not realize that testFunc() evaluates to a constant, so it cannot optimize the loop.

Certain functions like

private static int testFunc2(int test) {
    return 5*test;
}

the compiler likely tries to pre-optimize (before execution), but apparently not for the case of an parameter is passed in as an integer and evaluated in a conditional.

Your benchmark returns times like

time: 107
time: 106
time: 0
time: 0
...

suggesting that it takes 2 iterations of the outer-loop for the runtime compiler to finish optimizing. Compiling with the -server flag would probably return all 0's in the benchmark.

score 1 · Answer 5 · answered Jan 19 '12 at 22:22

The times are unrealistically fast per iteration. This means the JIT has detected that your code doesn't do anything and has eliminated it. Subtle changes can confuse the JIT and it can't determine the code doesn't do anything and it takes some time.

If you change the test to do something marginally useful, the difference will disappear.

score 0 · Answer 6 · answered Jan 19 '12 at 19:12

0

These benchmarks are tough since compilers are so darned smart. One guess: Since the result of testFunc() is ignored, the compiler might be completely optimizing it out. Add a counter, something like

   if (testFunc(3))
     counter++;

And, just for thoroughness, do a System.out.println(counter) at the end.

answered Jan 19 '12 at 19:12

user949300

15,364
7
35
66

But that would make the first version faster, but it's not. – Sean Owen Jan 19 '12 at 19:13
2

Could be the ordering, as suggested by @prelic. For some reason, JIT didn't optimize away the call the first time, but figured it out the 2nd time. – user949300 Jan 19 '12 at 19:17
I did run it both ways locally and it doesn't change things. – Sean Owen Jan 19 '12 at 19:19
I tried including the counter and changing testFunc(3) to testFunc(times). The first version (not calling testFunc()) is still 2 times slower. – x4- Jan 19 '12 at 20:10

Java efficiency

6 Answers6