15

The code below calls two simple functions 10 billion times each.

public class PerfTest {
    private static long l = 0;

    public static void main(String[] args) {
        List<String> list = Arrays.asList("a", "b");
        long time1 = System.currentTimeMillis();
        for (long i = 0; i < 1E10; i++) {
            func1("a", "b");
        }
        long time2 = System.currentTimeMillis();
        for (long i = 0; i < 1E10; i++) {
            func2(list);
        }
        System.out.println((time2 - time1) + "/" + (System.currentTimeMillis() - time2));
    }

    private static void func1(String s1, String s2) { l++; }
    private static void func2(List<String> sl) { l++; }
}

My assumption was that the performance of these two calls would be close to identical. If anything I would have guessed that passing two arguments would be slightly slower than passing one. Given all arguments are object references I wasn't expecting the fact that one was a list to make any difference.

I have run the test many times and a typical result is "12781/30536". In other words, the call using two strings takes 13 secs and the call using a list takes 30 secs.

What is the explanation for this difference in performance? Or is this an unfair test? I have tried switching the two calls (in case it was due to startup effects) but the results are the same.

Update

This is not a fair test for many reasons. However it does demonstrate real behaviour of the Java compiler. Note the following two additions to demonstrate this:

  • Adding expressions s1.getClass() and sl.getClass() to the functions makes the two function calls perfom the same
  • Running the test with -XX:-TieredCompilation also makes the two functions calls perform the same

The explanation for this behaviour is in the accepted answer below. The very brief summary of @apangin's answer is that func2 is not inlined by the hotspot compiler because the class of its argument (i.e. List) is not resolved. Forcing resolution of the class (e.g. using getClass) causes it to be inlined which significantly improves its performance. As pointed out in the answer, unresolved classes are unlikely to occur in real code which makes this code a unrealistic edge case.

sprinter
  • 27,148
  • 6
  • 47
  • 78
  • Can you add what you expected and why? – ChiefTwoPencils Oct 20 '16 at 23:08
  • 1
    @ChiefTwoPencils have added a para on that. – sprinter Oct 20 '16 at 23:12
  • I didn't vote to close, but unless someone is willing to hack apart the runtime to look at specific compilation optimizations, most performance questions aren't really very useful (although they can be amusing/interesting)--and the answers can change from release to release. In this case I'd just assume that the JVM found it easier to compile or memorize the two parameter call than the array call, but seriously--just write whatever is the most readable! Also note, the most readable version is often the one that the JVM optimizes best. – Bill K Oct 20 '16 at 23:13
  • 1
    @BillK I completely agree that clarity of code is more important than performance. But I am intrigued by why there's such a significant difference between the two and I'm posting because I'm hoping someone will have investigated this in the past and have an explanation. – sprinter Oct 20 '16 at 23:16
  • 2
    ps. For this call: func1("a", "b"); java probably doesn't pass anything, in fact it probably inlines the entire thing and just returns a constant. Java's abilities to optimize at runtime makes it amazingly hard to write a good performance test for (which in itself is a good indication of why java's performance is so great) – Bill K Oct 20 '16 at 23:18
  • @BillK I changed the test to avoid that and reposted. Same result. – sprinter Oct 20 '16 at 23:21
  • There's no significant difference when I run it on my machine. It's probably optimizations carried out by the JVM, as Bill suggests, which are different on different environments. – Klitos Kyriacou Oct 20 '16 at 23:25
  • @KlitosKyriacou ok thanks that's useful info – sprinter Oct 20 '16 at 23:36
  • I don't know what changes you made but passing a pointer to a string is guaranteed to be a constant. If the pointer is the same, the string will be the same. Passing a pointer to an array is not, the contents of the array can change. The compiler could be coded to rely on that behavior. That's really the only thing I can think of. (Other than they just spent more time optimizing the more common case) – Bill K Oct 21 '16 at 02:13
  • It is *so* easy to hit the memory allocator. Making a list is likely to do that. Unless it's smart enough to make the list on the stack, it will hit the memory allocator, which will be costly. – Mike Dunlavey Oct 22 '16 at 00:32
  • I don't know what's going on but very random things seem to make it optimized. For example, calling a random method (e.g. `list.get(0)`, for some reason `list.toString()` does nothing) or declaring the list as an `Object` and explicitly casting to a `List` at call-site, make it optimized. Also copying the source code of `Arrays.asList` into a local class and using it instead seemed to work. – Bubletan Oct 30 '16 at 03:08
  • @Bubletan Good observation. The reason is calling `List.get()` or explicit casting causes `List` class to be resolved. `list.toString()` actually calls `Object.toString()`, which does not cause resolution of `List`. `func2` is not inlined by JIT when there is an unresolved class in its signature. – apangin Oct 30 '16 at 12:29

1 Answers1

19

The benchmark is unfair, however, it has revealed an interesting effect.

As Sotirios Delimanolis has noticed, the performance difference is caused by the fact that func1 is inlined by HotSpot compiler, while func2 is not. The reason is func2 argument of type List, the class that has never been resolved during execution of the benchmark.

Note that List class is not actually used: no List methods called, no fields of type List declared, no class casts and no other actions performed that typically cause class resolution. If you add usage of List class anywhere in the code, func2 will be inlined.

The other cirumstance that affected compilation strategy is the simplicity of the method. It is so simple that JVM has decided to compile it in Tier 1 (C1 with no further optimization). If it were compiled with C2, List class would be resolved. Try running with -XX:-TieredCompilation, and you'll see that func2 is successfully inlined, and performs as fast as func1.

Writing realistic microbenchmarks manually is a really difficult job. There are so many aspects that may lead to confusing results, e.g. inlining, dead code elimination, on-stack replacement, profile pollution, recompilation etc. That's why it is highly recommended to use proper benchmarking tools like JMH. A hand-written benchmarks can easily fool JVM. Particularly, real applications are very unlikely to have methods with classes that are never used.

Community
  • 1
  • 1
apangin
  • 92,924
  • 10
  • 193
  • 247
  • I tried adding code to both functions to ensure `List` is resolved and also tried with the `TieredCompliation` option. As you predicted both make `func2` perform as fast as `func1`. Thanks for the clear explanation. I'll update the question to make that information clear to future readers. – sprinter Oct 30 '16 at 23:18
  • Since asking the question I've done quite a bit of research on microbenchmarks and have tried JMH as well as handcoding following suggestions at http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java. My conclusion after several attempts is that as long as you don't want precision and the test is over a sufficiently long period the stopwatch based tests reach exactly the same conclusion. In your opinion is all the angst about fair tests really justified or is it pendantry? – sprinter Nov 03 '16 at 03:24
  • @sprinter I have **many** examples when primitive stopwatch-based benchmarks produced misleading results. Here are just some examples from SO: [1](http://stackoverflow.com/a/24889503/3448419), [2](http://stackoverflow.com/a/30745078/3448419), [3](http://stackoverflow.com/a/38578777/3448419), [4](http://stackoverflow.com/a/33858960/3448419), [5](http://stackoverflow.com/a/24344114/3448419), [6](http://stackoverflow.com/a/33193452/3448419), [7](http://stackoverflow.com/a/25242358/3448419), [8](http://stackoverflow.com/a/39907607/3448419), [9](http://stackoverflow.com/a/35671374/3448419) – apangin Nov 05 '16 at 00:02
  • Another good reading about [common benchmarking pitfalls](http://www.oracle.com/technetwork/articles/java/architect-benchmarking-2266277.html) – apangin Nov 05 '16 at 00:04
  • Benchmark scores themselves are *meaningless*, even when measured by a good tool. A conclusion can be made only when these scores are *explained*. JMH also helps to *explain* the scores, e.g. by showing hot spots, assembly listings, GC stats etc. For example, in this question it helped me to find that `func1` was compiled by C2 and inlined, while `func2` was compiled by C1. – apangin Nov 05 '16 at 00:19
  • Thanks that's useful info - good examples and agree on value of explanation. – sprinter Nov 05 '16 at 04:34