1

What does actually happens when we concatenate a string S2 of size Y to a string S1 of size X ( already present on the heap) using the + operator?

This is what I think:

If I execute the following function:

class StringConcatenation{

    String S1;

    String concat(String S2){
       this.S1 = this.S1 + S2;
    }
}

If S1 was present in the string pool (which is stored in the heap) and we execute the concat method, then the method gets executed on the stack.

So, the CPU will need to copy the S1 in the stack => READ S1

As strings are immutable, Java must make a new object (let's name its reference as S3).

Now, contents of S1 and S2 are copied at new object => COPY S1 + COPY S2

Then the reference S1 points to the new object.

Therefore, the total time complexity is O(READ S1 + COPY S1 + COPY S2 ) = O(X + X + Y) = O(2*X + Y).

Is my thought process correct?

Deepak Tatyaji Ahire
  • 4,883
  • 2
  • 13
  • 35
  • Looks OK to me. Except `O(2*X + Y)` would usually be written as `O(X + Y)`, as big-O discards multiplicative constants (that's kind of the point of it). –  Mar 02 '21 at 10:29
  • Seems correct, yes – Mike B Mar 02 '21 at 10:30
  • 2
    https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/String.html money quote: "For example, the javac compiler may implement the [+] operator with StringBuffer, StringBuilder, or java.lang.invoke.StringConcatFactory depending on the JDK version. " so it may *not* be the case. – Federico klez Culloca Mar 02 '21 at 10:30
  • 2
    "will need to copy the S1 in the stack" does not make any sense - Strings are never copied to the stack - only their references. – Gyro Gearless Mar 02 '21 at 10:32
  • @GyroGearless, can you please elaborate your answer. – Deepak Tatyaji Ahire Mar 02 '21 at 10:57
  • 1
    Why you READ S1 and then COPY S1 afterwards?? Both Strings will live on the heap, and remain there, the new object instance will also be created on the heap. In Java, only references (and primitives) are handled on the stack. – tquadrat Mar 02 '21 at 11:01

2 Answers2

2

in the doc you can read:

The Java language provides special support for the string concatenation operator ( + ), and for conversion of other objects to strings. String concatenation is implemented through the StringBuilder(or StringBuffer) class and its append method.

a = a + b is the equivalent of a += b or:

a = new StringBuilder()
    .append(a)
    .append(b)
    .toString();
ΦXocę 웃 Пepeúpa ツ
  • 47,427
  • 17
  • 69
  • 97
  • 3
    That reference is rather old... Since Java 9, [the concat operator (`+`) doesn't always mean that a `StringBuilder` is used under the hood](https://stackoverflow.com/a/46513041/180719). – Olivier Grégoire Mar 02 '21 at 10:45
2

Disclaimer: The String class has undergone multiple changes to improve performance and space utilization. What happens when JIT compiles code is then entirely undefined. The following is a simplification, and ignores any optimizations that may or may not be applied.

String is a class that encapsulates a char[]. The array length is always exactly the length() of the string. The class, and the underlying array, is immutable.

class String {
    private final char[] arr;
}

StringBuilder (and StringBuffer) is another class that encapsulates a char[], but the array is almost always larger than the number of characters in the array. The class, and the array, is mutable.

class StringBuilder {
    private char[] arr;
    private int len;
}

When you do string concatenation with the + operator, the compiler generates that as:

// Java code
s = s1 + s2 + s3;

// Generated code
s = new StringBuilder().append(s1).append(s2).append(s3).toString();

StringBuilder will initially create the array with length 16, and will re-allocate the array when needed. Worst case is that s1, s2, and s3 are all too large for the current array, so each append() call needs to re-size the array.

This means that the would progress as follows:

  1. new StringBuilder() - Creates char[16].

  2. append(s1) - Resizes arr, then copies chars from s1.arr to the array.

  3. append(s2) - Resizes arr, copies existing content (chars from s1) to new array, then copies chars from s2.arr to the array.

  4. append(s3) - Resizes arr, copies existing content (chars from s1 and s2) to new array, then copies chars from s3.arr to the array.

  5. toString() - Create new String with char[] sized to exactly fit the characters in the StringBuilder, then copies the content (chars from s1, s2, and s3) to the new String.

All-in-all the chars from s1 ends up being copied 4 times.

If the string concatenation is S1 + S2, like in the question, then the characters from S1 are copied 2 or 3 times, and the characters from S2 are copied 2 times.

Since time complexity is generally worst case, that means O(3m + 2n), not the O(2m + n) suggested in the question. Of course, Big-O eliminates constant factors, so it is actually O(m + n).

Andreas
  • 154,647
  • 11
  • 152
  • 247