48

I have a microbenchmark that shows very strange results:

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000)
@Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000)
public class Chaining {

    private String a1 = "111111111111111111111111";
    private String a2 = "222222222222222222222222";
    private String a3 = "333333333333333333333333";

    @Benchmark
    public String typicalChaining() {
        return new StringBuilder().append(a1).append(a2).append(a3).toString();
    }

    @Benchmark
    public String noChaining() {
        StringBuilder sb = new StringBuilder();
        sb.append(a1);
        sb.append(a2);
        sb.append(a3);
        return sb.toString();
    }
}

I'm expecting the results of both tests to be the same or at least very close. However, the difference is almost 5x:

# Run complete. Total time: 00:01:41

Benchmark                  Mode  Cnt      Score     Error  Units
Chaining.noChaining       thrpt   40   8538.236 ± 209.924  ops/s
Chaining.typicalChaining  thrpt   40  36729.523 ± 988.936  ops/s

Does anybody know how that is possible?

Eugene
  • 117,005
  • 15
  • 201
  • 306
Dmitriy Dumanskiy
  • 11,657
  • 9
  • 37
  • 57
  • 1
    Perhaps it has an optimization rule for chained .append calls where it constructs the entire string at once? That would make a lot of sense. –  Jun 02 '17 at 17:17
  • @Mints97 shouldn't compiler do the same for regular sb.append() constructions? – Dmitriy Dumanskiy Jun 02 '17 at 17:18
  • 3
    @JarrodRoberson my expectations are totally correct. Wouldn't you expect the same? It is almost identical code. Even chaining generates less bytecode. 5x difference? No way. – Dmitriy Dumanskiy Jun 02 '17 at 17:27
  • It's interesting to note that if you run this with the JIT off, they work as you would expect. (`-Djava.compiler=NONE`, by the way). Results are close to identical for each benchmark (and much slower). – Todd Jun 02 '17 at 17:56
  • 5
    "my expectations are totally correct" and yet the results were not what you expected. You and I have different definitions of "totally correct". – Lew Bloch Jun 02 '17 at 20:32
  • @JarrodRoberson What are you talking about? The bytecode differs, so the speed may differ as well. That's about all we can say when relating bytecode to speed (and even that isn't sure). The additional loads/stored are much cheaper than the method invocations, so there should be hardly any difference before the optimizer kicks in. Thereafter, it should be the same. And it would be the same, if there wasn't a sort of intrinsic. – maaartinus Jun 05 '17 at 01:29
  • I don't know if it matters that your 3 strings are all literals, but that makes me wonder if this enables the compiler to do some optimization thanks to knowing the string values at compile time (I know they are not `final`, but I don't know if the compiler can still spot that they are never reassigned). If it was me I would test with strings that are dynamically built at runtime in a way such that the compiler is not able to infer their values at compile time, just to be safer. – SantiBailors Nov 16 '17 at 11:13
  • @SantiBailors these strings are not literals, but variables. Please read the documentation for JMH. – Dmitriy Dumanskiy Nov 16 '17 at 11:22
  • @DmitriyDumanskiy They are literals assigned to variables which are never reassigned, which is the reason for my _doubt_. Sorry for my imprecision and thanks for the invitation to read the doc. Explaining me why you are sure that the compiler can't spot that and do an optimization would have been more useful though, as my comment was mainly meant to understand something I didn't know about how code like the one in your example is treated. – SantiBailors Nov 16 '17 at 11:45
  • @SantiBailors want me to google for you? Ok, no problem - http://hg.openjdk.java.net/code-tools/jmh/file/1ddf31f810a3/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_10_ConstantFold.java – Dmitriy Dumanskiy Nov 16 '17 at 11:49
  • @DmitriyDumanskiy Look, sorry for my comment, never mind. – SantiBailors Nov 16 '17 at 11:51

1 Answers1

59

String concatenation a + b + c is a very frequent pattern in Java programs, so HotSpot JVM has a special optimization for it: -XX:+OptimizeStringConcat which is ON by default.

HotSpot JVM recognizes new StringBuilder().append()...append().toString() pattern in the bytecode and translates it to the optimized machine code without calling actual Java methods and without allocating intermediate objects. I.e. this is a kind of compound JVM intrinsic.

Here is the source code for this optimization.

On the other side, sb.append(); sb.append(); ... is not handled specially. This sequence is compiled just like a regular Java method calls.

If you rerun the benchmark with -XX:-OptimizeStringConcat, the performance will be the same for both variants.

apangin
  • 92,924
  • 10
  • 193
  • 247