Disclaimer: The String
class has undergone multiple changes to improve performance and space utilization. What happens when JIT compiles code is then entirely undefined. The following is a simplification, and ignores any optimizations that may or may not be applied.
String
is a class that encapsulates a char[]
. The array length is always exactly the length()
of the string. The class, and the underlying array, is immutable.
class String {
private final char[] arr;
}
StringBuilder
(and StringBuffer
) is another class that encapsulates a char[]
, but the array is almost always larger than the number of characters in the array. The class, and the array, is mutable.
class StringBuilder {
private char[] arr;
private int len;
}
When you do string concatenation with the +
operator, the compiler generates that as:
// Java code
s = s1 + s2 + s3;
// Generated code
s = new StringBuilder().append(s1).append(s2).append(s3).toString();
StringBuilder
will initially create the array with length 16, and will re-allocate the array when needed. Worst case is that s1
, s2
, and s3
are all too large for the current array, so each append()
call needs to re-size the array.
This means that the would progress as follows:
new StringBuilder()
- Creates char[16]
.
append(s1)
- Resizes arr
, then copies chars from s1.arr
to the array.
append(s2)
- Resizes arr
, copies existing content (chars from s1
) to new array, then copies chars from s2.arr
to the array.
append(s3)
- Resizes arr
, copies existing content (chars from s1
and s2
) to new array, then copies chars from s3.arr
to the array.
toString()
- Create new String
with char[]
sized to exactly fit the characters in the StringBuilder
, then copies the content (chars from s1
, s2
, and s3
) to the new String
.
All-in-all the chars from s1
ends up being copied 4 times.
If the string concatenation is S1 + S2
, like in the question, then the characters from S1
are copied 2 or 3 times, and the characters from S2
are copied 2 times.
Since time complexity is generally worst case, that means O(3m + 2n), not the O(2m + n) suggested in the question. Of course, Big-O eliminates constant factors, so it is actually O(m + n).