Java 8: Class.getName() slows down String concatenation chain

Question

Recently I've run into an issue regarding String concatenation. This benchmark summarizes it:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class BrokenConcatenationBenchmark {

  @Benchmark
  public String slow(Data data) {
    final Class<? extends Data> clazz = data.clazz;
    return "class " + clazz.getName();
  }

  @Benchmark
  public String fast(Data data) {
    final Class<? extends Data> clazz = data.clazz;
    final String clazzName = clazz.getName();
    return "class " + clazzName;
  }

  @State(Scope.Thread)
  public static class Data {
    final Class<? extends Data> clazz = getClass();

    @Setup
    public void setup() {
      //explicitly load name via native method Class.getName0()
      clazz.getName();
    }
  }
}

On JDK 1.8.0_222 (OpenJDK 64-Bit Server VM, 25.222-b10) I've got the following results:

Benchmark                                                            Mode  Cnt     Score     Error   Units
BrokenConcatenationBenchmark.fast                                    avgt   25    22,253 ±   0,962   ns/op
BrokenConcatenationBenchmark.fast:·gc.alloc.rate                     avgt   25  9824,603 ± 400,088  MB/sec
BrokenConcatenationBenchmark.fast:·gc.alloc.rate.norm                avgt   25   240,000 ±   0,001    B/op
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Eden_Space            avgt   25  9824,162 ± 397,745  MB/sec
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Eden_Space.norm       avgt   25   239,994 ±   0,522    B/op
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Survivor_Space        avgt   25     0,040 ±   0,011  MB/sec
BrokenConcatenationBenchmark.fast:·gc.churn.PS_Survivor_Space.norm   avgt   25     0,001 ±   0,001    B/op
BrokenConcatenationBenchmark.fast:·gc.count                          avgt   25  3798,000            counts
BrokenConcatenationBenchmark.fast:·gc.time                           avgt   25  2241,000                ms

BrokenConcatenationBenchmark.slow                                    avgt   25    54,316 ±   1,340   ns/op
BrokenConcatenationBenchmark.slow:·gc.alloc.rate                     avgt   25  8435,703 ± 198,587  MB/sec
BrokenConcatenationBenchmark.slow:·gc.alloc.rate.norm                avgt   25   504,000 ±   0,001    B/op
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Eden_Space            avgt   25  8434,983 ± 198,966  MB/sec
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Eden_Space.norm       avgt   25   503,958 ±   1,000    B/op
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Survivor_Space        avgt   25     0,127 ±   0,011  MB/sec
BrokenConcatenationBenchmark.slow:·gc.churn.PS_Survivor_Space.norm   avgt   25     0,008 ±   0,001    B/op
BrokenConcatenationBenchmark.slow:·gc.count                          avgt   25  3789,000            counts
BrokenConcatenationBenchmark.slow:·gc.time                           avgt   25  2245,000                ms

This looks like an issue similar to JDK-8043677, where an expression having side effect breaks optimization of new StringBuilder.append().append().toString() chain. But the code of Class.getName() itself does not seem to have any side effects:

private transient String name;

public String getName() {
  String name = this.name;
  if (name == null) {
    this.name = name = this.getName0();
  }

  return name;
}

private native String getName0();

The only suspicious thing here is a call to native method which happens in fact only once and its result is cached in the field of the class. In my benchmark I've explicitly cached it in setup method.

I've expected branch predictor to figure out that at each benchmark invocation the actual value of this.name is never null and optimize the whole expression.

However, while for the BrokenConcatenationBenchmark.fast() I have this:

@ 19   tsypanov.strings.benchmark.concatenation.BrokenConcatenationBenchmark::fast (30 bytes)   force inline by CompileCommand
  @ 6   java.lang.Class::getName (18 bytes)   inline (hot)
    @ 14   java.lang.Class::initClassName (0 bytes)   native method
  @ 14   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
  @ 19   java.lang.StringBuilder::append (8 bytes)   inline (hot)
  @ 23   java.lang.StringBuilder::append (8 bytes)   inline (hot)
  @ 26   java.lang.StringBuilder::toString (35 bytes)   inline (hot)

i.e. compiler is able to inline everything, for BrokenConcatenationBenchmark.slow() it is different:

@ 19   tsypanov.strings.benchmark.concatenation.BrokenConcatenationBenchmark::slow (28 bytes)   force inline by CompilerOracle
  @ 9   java.lang.StringBuilder::<init> (7 bytes)   inline (hot)
    @ 3   java.lang.AbstractStringBuilder::<init> (12 bytes)   inline (hot)
      @ 1   java.lang.Object::<init> (1 bytes)   inline (hot)
  @ 14   java.lang.StringBuilder::append (8 bytes)   inline (hot)
    @ 2   java.lang.AbstractStringBuilder::append (50 bytes)   inline (hot)
      @ 10   java.lang.String::length (6 bytes)   inline (hot)
      @ 21   java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)   inline (hot)
        @ 17   java.lang.AbstractStringBuilder::newCapacity (39 bytes)   inline (hot)
        @ 20   java.util.Arrays::copyOf (19 bytes)   inline (hot)
          @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
          @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)
      @ 35   java.lang.String::getChars (62 bytes)   inline (hot)
        @ 58   java.lang.System::arraycopy (0 bytes)   (intrinsic)
  @ 18   java.lang.Class::getName (21 bytes)   inline (hot)
    @ 11   java.lang.Class::getName0 (0 bytes)   native method
  @ 21   java.lang.StringBuilder::append (8 bytes)   inline (hot)
    @ 2   java.lang.AbstractStringBuilder::append (50 bytes)   inline (hot)
      @ 10   java.lang.String::length (6 bytes)   inline (hot)
      @ 21   java.lang.AbstractStringBuilder::ensureCapacityInternal (27 bytes)   inline (hot)
        @ 17   java.lang.AbstractStringBuilder::newCapacity (39 bytes)   inline (hot)
        @ 20   java.util.Arrays::copyOf (19 bytes)   inline (hot)
          @ 11   java.lang.Math::min (11 bytes)   (intrinsic)
          @ 14   java.lang.System::arraycopy (0 bytes)   (intrinsic)
      @ 35   java.lang.String::getChars (62 bytes)   inline (hot)
        @ 58   java.lang.System::arraycopy (0 bytes)   (intrinsic)
  @ 24   java.lang.StringBuilder::toString (17 bytes)   inline (hot)

So the question is whether this is appropriate behaviour of the JVM or compiler bug?

I'm asking the question because some of the projects are still using Java 8 and if it won't be fixed on any of release updates then to me it's reasonable to hoist calls to Class.getName() manually from hot spots.

P.S. On the latest JDKs (11, 13, 14-eap) the issue is not reproduced.

You do have a side effect there - the assignment to `this.name`. — RealSkeptic, Dec 03 '19 at 14:08
@RealSkeptic the assignment happens only once at the very first invocation of `Class.getName()` and in `setUp()` method, not in the body of benchmarked one. — Sergey Tsypanov, Dec 03 '19 at 16:41

apangin · Accepted Answer · 2019-12-03T23:48:55.840

HotSpot JVM collects execution statistics per bytecode. If the same code is run in different contexts, the result profile will aggregate statistics from all contexts. This effect is known as profile pollution.

Class.getName() is obviously called not only from your benchmark code. Before JIT starts compiling the benchmark, it already knows that the following condition in Class.getName() was met multiple times:

    if (name == null)
        this.name = name = getName0();

At least, enough times to treat this branch statistically important. So, JIT did not exclude this branch from compilation, and thus could not optimize string concat due to possible side effect.

This does not even need to be a native method call. Just a regular field assignment is also considered a side effect.

Here is an example how profile pollution can harm further optimizations.

@State(Scope.Benchmark)
public class StringConcat {
    private final MyClass clazz = new MyClass();

    static class MyClass {
        private String name;

        public String getName() {
            if (name == null) name = "ZZZ";
            return name;
        }
    }

    @Param({"1", "100", "400", "1000"})
    private int pollutionCalls;

    @Setup
    public void setup() {
        for (int i = 0; i < pollutionCalls; i++) {
            new MyClass().getName();
        }
    }

    @Benchmark
    public String fast() {
        String clazzName = clazz.getName();
        return "str " + clazzName;
    }

    @Benchmark
    public String slow() {
        return "str " + clazz.getName();
    }
}

This is basically the modified version of your benchmark that simulates the pollution of getName() profile. Depending on the number of preliminary getName() calls on a fresh object, the further performance of string concatenation may dramatically differ:

Benchmark          (pollutionCalls)  Mode  Cnt   Score   Error  Units
StringConcat.fast                 1  avgt   15  11,458 ± 0,076  ns/op
StringConcat.fast               100  avgt   15  11,690 ± 0,222  ns/op
StringConcat.fast               400  avgt   15  12,131 ± 0,105  ns/op
StringConcat.fast              1000  avgt   15  12,194 ± 0,069  ns/op
StringConcat.slow                 1  avgt   15  11,771 ± 0,105  ns/op
StringConcat.slow               100  avgt   15  11,963 ± 0,212  ns/op
StringConcat.slow               400  avgt   15  26,104 ± 0,202  ns/op  << !
StringConcat.slow              1000  avgt   15  26,108 ± 0,436  ns/op  << !

More examples of profile pollution »

I can't call it either a bug or an "appropriate behaviour". This is just how dynamic adaptive compilation is implemented in HotSpot.

who else if not Pangin... do you happen to know if Graal C2 has the same illness? — Eugene, Dec 04 '19 at 05:01

score 2 · Answer 2 · answered Dec 03 '19 at 12:43

2

Slightly unrelated but since Java 9 and JEP 280: Indify String Concatenation the string concatenation is now done with invokedynamic and not StringBuilder. This article shows the differences in the bytecode between Java 8 and Java 9.

If the benchmark re-run on newer Java version doesn't show the problem there is most likley no bug in javac because the compiler now uses new mechanism. Not sure if diving into Java 8 behavior is beneficial if there is such a substantial change in the newer versions.

answered Dec 03 '19 at 12:43

Karol Dowbecki

43,645
9
78
111

1

I agree this is likely to be a compiler issue, not a one related to `javac` though. `javac` generates bytecode and doesn't do any sophisticated optimizations. I've run the same benchmark with `-XX:TieredStopAtLevel=1` and received this output: `Benchmark Mode Cnt Score Error Units` `BrokenConcatenationBenchmark.fast avgt 25 74,677 ? 2,961 ns/op` `BrokenConcatenationBenchmark.slow avgt 25 69,316 ? 1,239 ns/op` So when we don't optimize much both methods yield same results, the issue reveals itself only when the code gets C2-compiled. – Sergey Tsypanov Dec 03 '19 at 16:35
2

_is now done with invokedynamic and not StringBuilder_ is simply _wrong_. `invokedynamic` only tells the _runtime_ to choose how to do the concatenation, and 5 out of 6 strategies (including the default) still use `StringBuilder`. – Eugene Dec 04 '19 at 04:51
@Eugene thanks for pointing this out. When you say strategies do you mean `StringConcatFactory.Strategy` enum? – Karol Dowbecki Dec 04 '19 at 10:26
@KarolDowbecki exactly. – Eugene Feb 10 '20 at 16:51

Java 8: Class.getName() slows down String concatenation chain

2 Answers2