3

Today I was reading Antonio's Blog about toString() performance and there is a paragraph:

What used to be considered evil yesterday (“do not concatenate Strings with + !!!“), has become cool and efficient! Today the JVM compiles the + symbol into a string builder (in most cases). So, do not hesitate, use it.

Now I am confused, because he is saying Today the JVM compiles the + symbol into a string builder (in most cases), but I have never heard or seen(code) anything like this before.

Could someone please give example where JVM does this and in what conditions it happens?

Mehraj Malik
  • 14,872
  • 15
  • 58
  • 85
  • possible duplicate: https://stackoverflow.com/questions/47605/string-concatenation-concat-vs-operator – Eric May 24 '17 at 05:02
  • @Eric I am afraid this is not related to above mentioned question. Because in this question he clearly stated that **the concat() method only accepts String values while the + operator will silently convert the argument to a String (using the toString() method for objects)**. However, I am talking about conversion happening into **StringBuilder**. Please fill me in If something is missing. – Mehraj Malik May 24 '17 at 05:04
  • 2
    @MehrajMalik did you even look at the accepted answer? That´s pretty much a spot on dupe – SomeJavaGuy May 24 '17 at 05:06
  • 3
    @SomeJavaGuy Yes, I did. He mentioned that StringBuilder conversion is happening behind **+** operator. However, my question is that does this happen all the time or it needs some specific condition. As stated in blog(In most cases it happens). So what are the cases in which it does not convert into StringBuilder? – Mehraj Malik May 24 '17 at 05:10
  • DOWNVOTERS, could you please care to tell what is wrong with this question? – Mehraj Malik May 24 '17 at 06:02
  • Possible duplicate of [String concatenation: concat() vs "+" operator](https://stackoverflow.com/questions/47605/string-concatenation-concat-vs-operator) – Ole V.V. May 24 '17 at 06:59
  • @OleV.V. NO, CRYSTAL CLEAR, it's not duplicate. – Mehraj Malik May 24 '17 at 07:16
  • 1
    You may like to see this. [http://www.pellegrino.link/2015/08/22/string-concatenation-with-java-8.html] . Adding string with "+" will give you O(n^2) complexity while StringBuilder's append(String) method will give you O(n) complexity – Kangkan May 25 '17 at 07:32

4 Answers4

14

The rule

“do not concatenate Strings with + !!!“

is wrong, because it is incomplete and therefore misleading.

The rule is

do not concatenate Strings with + in a loop

and that rule still holds. The original rule was never meant to be applied outside of loops!

A simple loop

String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);

is still much still much slower than

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());

because the Java compiler has to translate the first loop into

String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);

Also the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).


One could also argue that the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.


For the question: when does the Java compiler use StringBuilder.append() and when does it use some other mechanism?

The source code of the Java compiler (version 1.8) contains two places where String concatenation through the + operator is handled.

The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).

Thomas Kläger
  • 17,754
  • 3
  • 23
  • 34
  • 1
    "*because the compilation is not done by the JVM*", haha, nice catch. However, [JLS 15.18.1](https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.18.1) says that "*a Java compiler may use the StringBuffer class*". However, it fails to say what is that case when it doesn't use a it. – dumbPotato21 May 24 '17 at 05:27
  • @ChandlerBing and Thomas(**Impressive catch about JVM compilation**), Yes, exactly. No one has mentioned what is/are the condition it does not use a StringBuilder for concatenation. – Mehraj Malik May 24 '17 at 05:30
  • According to https://docs.oracle.com/javase/8/docs/api/java/lang/String.html "The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method. String conversions are implemented through the method toString, defined by Object and inherited by all classes in Java. For additional information on string concatenation and conversion, see Gosling, Joy, and Steele, The Java Language Specification." – Eric May 24 '17 at 05:39
  • 2
    @MehrajMalik I currently know only of to cases for string concatenation: either both operands are constant strings (and then the compiler replaces it with a string literal) or it at least one operand is not a constant string (and then the compiler uses StringBuilder). But i will try and look into the java compiler for other cases. – Thomas Kläger May 24 '17 at 05:44
  • Since the concatenation code for non-constant values this is up to the specific compiler, it is impossible to say that all compilers are doing this, as that would imply that we claim to know all compilers. We might say, that all *relevant* compilers, i.e. `javac` and ecj, always used the optimization strategy, though. By the way, Java 9’s `javac` will not use `StringBuilder`, however, that’s because the new strategy is considered to be even better… – Holger May 24 '17 at 11:56
  • @MehrajMalik I've updated my answer with the information that I've found in the source code of the Java compiler – Thomas Kläger May 24 '17 at 21:32
5

Holger is right in his comment that in java-9 + for String concatenation is going to change from a StringBuilder to a strategy chosen by the JRE via invokedynamic. There are 6 strategies that are possible for String concatenation in jdk-9:

  private enum Strategy {
    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder}.
     */
    BC_SB,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but trying to estimate the required storage.
     */
    BC_SB_SIZED,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but computing the required storage exactly.
     */
    BC_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also tries to estimate the required storage.
     */
    MH_SB_SIZED,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also estimate the required storage exactly.
     */
    MH_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that constructs its own byte[] array from
     * the arguments. It computes the required storage exactly.
     */
    MH_INLINE_SIZED_EXACT
}

And the default one is not using a StringBuilder, it is MH_INLINE_SIZED_EXACT. It is actually pretty crazy how the implementation works, and it is trying to be highly optimized.

So, no the advice there as far as I can tell is bad. That by the way is the main effort that was put into by jdk by Aleksey Shipilev. He also added a big change into String internals in jdk-9 as they are now backed by a byte[] instead of char[]. This needed because ISO_LATIN_1 Strings can be encoded in a single byte (one character - one byte) so a lot of less space.

Eugene
  • 117,005
  • 15
  • 201
  • 306
4

The statement, in this exact form, is just wrong, and it fits into the picture that the linked blog continues to write nonsense, like that you had to wrap references with Objects.toString(…) to handle null, e.g. "att1='" + Objects.toString(att1) + '\'' instead of just "att1='" + att1 + '\''. There is no need to do that and apparently, the author did never re-check these claims.

The JVM is not responsible for compiling the + operator, as this operator is merely a source code artifact. It’s the compiler, e.g. javac which is responsible, and while there is no guaranty about the compiled form, compilers are encouraged to use a builder by the Java Language Specification:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

Note that even if a compiler does not perform this optimization, there still is no such thing as a + operator on the byte code level, so the compiler has to pick an operation, a JVM understands, e.g. using String.concat, which might be even faster than using a StringBuilder in the case you’re just concatenating exactly two strings.

Even assuming the worst compilation strategy for string concatenation (still being within the specification), it would be wrong to say to never concatenate strings with +, as when you are defining compile time constants, using + is the only choice, and, of course, a compile-time constant is usually more efficient than using a StringBuilder at runtime.

In practice, the + operator applied to non constant strings was compiled to a StringBuffer usage before Java 5 and to a StringBuilder usage in Java 5 to Java 8. When the compiled code is identical to the manual usage of StringBuffer resp. StringBuilder, there can’t be a performance difference.

The transition to Java 5, more than a decade ago, was the first time, where string concatenation via + had a clear win over manual StringBuffer use, as simply recompiling the concatenation code made it use the potentially faster StringBuilder internally, while the code manually dealing with StringBuffer needed to be rewritten to use StringBuilder, which had been introduced in that version.

Likewise, Java 9 is going to compile the string concatenation using an invokedynamic instruction allowing the JRE to bind it to actual code doing the operation, including optimizations not possible in ordinary Java code. So only recompiling the string concatenation code is needed to get this feature, while there is no equivalent manual usage for it.

That said, while the premise is wrong, i.e. string concatenation never was considered evil, the advice is correct, do not hesitate to use it.

There are only a few cases where you really might improve performance by dealing with a buffer manually, i.e. when you need a large initial capacity or concatenate a lot within loops and that code has been identified as an actual performance bottleneck by a profiling tool

Holger
  • 285,553
  • 42
  • 434
  • 765
0

When you concatenate strings using + operator, compiler translates concatenation code to use StringBuffer for better performance. In order to improve performance StringBuffer is the better choice.

The quickest way of concatenate two string using + operator.

String str = "Java";
str = str + "Tutorial";

The compiler translates this code as:

String s1 = "Java";
StringBuffer sb = new StringBuffer(s1);
sb.append("Tutorial");
s1 = sb.toString();

So it is better to use StringBuffer OR String.format for concatenation

Using String.format

String s = String.format("%s %s", "Java", "Tutorial");
Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
PEHLAJ
  • 9,980
  • 9
  • 41
  • 53