9

In the following code snippet, Foo1 is a class that increments a counter every time the method bar() is called. Foo2 does the same thing but with one additional level of indirection.

I would expect Foo1 to be faster than Foo2, however in practice, Foo2 is consistently 40% faster than Foo1. How is the JVM optimizing the code such that Foo2 runs faster than Foo1?

Some Details

  • The test was executed with java -server CompositionTest.
  • Running the test with java -client CompositionTest produces the expected results, that Foo2 is slower than Foo1.
  • Switching the order of the loops does not make a difference.
  • The results were verified with java6 on both sun and openjdk's JVMs.

The Code

public class CompositionTest {

    private static interface DoesBar {
        public void bar();
        public int count();
        public void count(int c);
    }

    private static final class Foo1 implements DoesBar {
        private int count = 0;
        public final void bar() { ++count; }
        public int count() { return count; }
        public void count(int c) { count = c; }
    }

    private static final class Foo2 implements DoesBar {
        private DoesBar bar;
        public Foo2(DoesBar bar) { this.bar = bar; }
        public final void bar() { bar.bar(); }
        public int count() { return bar.count(); }
        public void count(int c) { bar.count(c); }
    }

    public static void main(String[] args) {
        long time = 0;
        DoesBar bar = null;
        int reps = 100000000;

        for (int loop = 0; loop < 10; loop++) {
            bar = new Foo1();
            bar.count(0);

            int i = reps;
            time = System.nanoTime();
            while (i-- > 0) bar.bar();
            time = System.nanoTime() - time;

            if (reps != bar.count())
                throw new Error("reps != bar.count()");
        }
        System.out.println("Foo1 time: " + time);

        for (int loop = 0; loop < 10; loop++) {
            bar = new Foo2(new Foo1());
            bar.count(0);

            int i = reps;
            time = System.nanoTime();
            while (i-- > 0) bar.bar();
            time = System.nanoTime() - time;

            if (reps != bar.count())
                throw new Error("reps != bar.count()");
        }
        System.out.println("Foo2 time: " + time);
    }
}
skaffman
  • 398,947
  • 96
  • 818
  • 769
TianyuZhu
  • 752
  • 6
  • 8
  • 7
    Did you try switching the loops to see if it made a difference? – AHungerArtist May 17 '12 at 23:41
  • I tried it (with JRE7 -server) and it did not make a difference. Whaaat?! I know it should, that the JRE need to warm up and compile the code, but ... it doesn't make a difference. At all. – Petr Janeček May 18 '12 at 00:19
  • 1
    Wow. Great question. I found that if you add up the time taken overall (rather than just measuring the count of the final loop), that the two methods are much closer (although strangely with foo2 being around 2% faster). As to the reason, ? – Chris May 18 '12 at 01:13
  • 8
    It might be a good idea to read a good SO post about [Java microbenchmarking](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) and then review your code to make sure your results are actually meaningful. – QuantumMechanic May 18 '12 at 00:05
  • @AHungerArtist: Switching the order of the loops does not make a difference. – TianyuZhu May 18 '12 at 14:45
  • @Chris: Just looking at the last measurement allows the JVM to "warm-up" the code. The first few runs of the code are interpreted whereas the later runs of the code will be compiled. – TianyuZhu May 18 '12 at 14:57

2 Answers2

2

Your microbench mark is meaningless. On my computer the code runs in about 8ms for each loop... To have any meaningful number a benchmark should probably run for at least a second.

When run both for around a second (hint, you need more than Integer.MAX_VALUE repetitions) I find that the run times of both are identical.

The likely explanation for this is that the JIT compiler has noticed that your indirection is meaningless and optimised it out (or at least inlined the method calls) such that the code executed in both loops is identical.

It can do this because it knows bar in Foo2 is effectively final, it also know that the argument to the Foo2 constructor is always going to be a Foo1 (at least in our little test). That way it knows the exact code path when Foo2.bar is called. It also knows that this loop is going to run a lot of times (in fact it knows exactly how many times the loop will execute) -- so it seems like a good idea to inline the code.

I have no idea if that is precisely what it does, but these are all logical observations that the JIT could me making about the code. Perhaps in the future some JIT compilers might even optimise the entire while loop and simply set count to reps, but that seems somewhat unlikely.

Dunes
  • 37,291
  • 7
  • 81
  • 97
  • You're computer must be pretty fast. I suggest you change `reps` from `int` to `long` and retry with `Long.MAX_VALUE` reps. – TianyuZhu May 18 '12 at 18:09
  • I have an i3... I converted your code to C++ where it takes a **lot** longer (13 seconds for `Foo1` and 22 seconds for `Foo2`) when using `Integer.MAX_VALUE`. For the java version, if I do something as simple as changing the type of `i` to long (not the value, just the type), the execution time is now about 3 seconds. As someone else said, micro benchmarking is never a good idea on Java. – Dunes May 18 '12 at 21:22
1

Trying to predict performance on modern languages is not very productive.

The JVM is constantly modified to increase performance of common, readable structures which, in contrast, makes uncommon, awkward code slower.

Just write your code as clearly as you can--then if you really identify a point where your code is actually identified as too slow to pass written specifications, you may have to hand-tweak some areas--but this will probably involve large, simple ideas like object caches, tweaking JVM options and eliminating truly stupid/wrong code (Wrong data structures can be HUGE, I once changed an ArrayList to a LinkedList and reduced an operation from 10 minutes to 5 seconds, multi-threading a ping operation that discovered a class-B network took an operation from 8+ hours to minutes).

Bill K
  • 62,186
  • 18
  • 105
  • 157
  • 1
    I'm simply trying to understand the behaviour of the JVM. In this case, I'm just curious about how the JVM handles such levels of indirection. This isn't _premature optimization_. I simply want to understand how code that I write is interpreted by the jit compiler. – TianyuZhu May 18 '12 at 18:13
  • @TianyuZhu I totally understand, I guess I'd say that the behavior of the JVM is to change and adapt--so ever trying to USE your current knowledge of the JVM to increase speed is a bad thing--but learning about it is always fun. – Bill K May 22 '12 at 15:20