1

I read on several manuals and online sources that the running time of "simple string concatenation" is O(n^2)?

The algorithm is this: we take the first 2 strings, create a new string, copy the characters of the 2 original strings in the new string, and repeat this process over and over again until all strings are concatenated. We are not using StringBuilder or similar implementations: just a simple string concatenation.

I think the running time should be something like O(kn) where k = number of strings, n = total number of characters. You don't copy the same characters n times, but k times, so it should not be O(n^2). For example, if you have 2 strings, it's just O(n). Basically it's n + (n-x) + (n-y) + (n-z)... but k times, not n times.

Where am I wrong?

iClaude
  • 707
  • 8
  • 14
  • 4
    Please cite your source – Sean Bright Oct 09 '19 at 18:05
  • 1
    One source is the book "Cracking the coding interview" by Gayle Laakmann. An online source is https://dzone.com/articles/string-concatenation-performacne-improvement-in-ja but there are many many more. – iClaude Oct 09 '19 at 18:08
  • [This answer agrees with you.](https://stackoverflow.com/a/24111546/21926) – Sean Bright Oct 09 '19 at 18:13
  • and, lastly, O(n^2) is a *bounded asymptotic complexity*, or, more formally, a *complexity group* (see https://en.wikipedia.org/wiki/Big_O_notation#Infinite_asymptotics) - if it's O(nm), and m is usually smaller than n, we can safely express that as O(n^2) - it's an abuse of notation anyway :} –  Oct 09 '19 at 18:37
  • @Unimportant I agree with that answer. Thanks. – iClaude Oct 09 '19 at 18:51
  • 2
    @iClaude You continue to say in your edited question that you want to know how Java does this prior to `StringBuilder`. Well, prior to StringBuilder it used `StringBuffer` which works the same way except for synchronization so you can use it in concurrent applications. That has been there since Java 1.0. So there is not other way to evaluate concatenation for Java unless you enquire about the efficiency of a particular algorithm. – WJS Oct 09 '19 at 19:19
  • I tought about it and I think that the answer linked by @Unimportant is correct except for a little math error. In the first line the first n1 should be deleted, because we are starting concatenating n1 with n2. But the answer is correct. – iClaude Oct 09 '19 at 20:36
  • it's still a dupe (of https://stackoverflow.com/questions/24111433/cost-of-string-concatenation/24111546 & https://stackoverflow.com/questions/11032015/how-to-find-time-complexity-of-an-algorithm e.g.), but it's a lot clearer now, mental +1 to OP for fixing this. –  Oct 09 '19 at 21:58

2 Answers2

4

A precise problem statement is necessary here:

There are two metrics to consider: How much space is required and how much time is required. This note looks at the time requirements.

The concatenation operation is specified to only concatenate two of the strings at a time, with concatentation being performed with left association:

((k1 + k2) + k3) ...

There are two parameters that may be considered, and two ways of looking at the second parameter.

The first parameter is the total size (in characters) of the strings which are to be concatenated.

The second parameter is either the number of strings which are to be concatenated, or is the size of each of the strings which are to be concatenated.

Considering the first case:

n - Total size (in characters) of the strings to be concatenated.

k - Total number of strings to be concatenated.

The time the concatenation is roughly:

(n/k) * (k^2) / 2

Or, to within a constant factory:

n * k

Then, for a fixed 'k', the concatenation time is linear!

Considering instead the second case:

n - Total size of the strings

m - Size of each of the sub-strings

This corresponds to the prior case but with:

k = n / m

The prior estimate then becomes:

n * k = n * (n / m) = n^2 / m

That is, for a fixed 'm', the concatenation time is quadratic.

Thomas Bitonti
  • 1,179
  • 7
  • 14
  • 1
    +1. The essential points: (1) we know that *k* ≤ *n*, so O(kn) ⊆ O(n²); and (2) if we assume that all the strings have the same size (to within a constant factor), then *n* is directly proportional to *k*, so O(kn) = O(n²). So the statement that concatenation in a loop takes O(n²) time is both technically accurate and a practical way to think about it, even if there are more-precise statements that could be made if we wanted. – ruakh Oct 10 '19 at 16:29
1

If you write some tests and look at the byte code you will see that StringBuilder is used to implement concatenation. And sometimes it will pre-allocate the internal array to increase the efficiency to do so. That is clearly not O(n^2) complexity.

Here is the Java code.


  public static void main(String[] args) {
      String[] william = {
            "To ", "be ", "or ", "not ", "to ", ", that", "is ", "the ",
            "question."
      };
      String quote = "";
      for (String word : william) {
         quote += word;
      }
   }

Here is the byte code.

 public static void main(java.lang.String[] args);
      0  bipush 9
      2  anewarray java.lang.String [16]
      5  dup
      6  iconst_0
      7  ldc <String "To "> [18]
      9  aastore
     10  dup
     11  iconst_1
     12  ldc <String "be "> [20]
     14  aastore
     15  dup
     16  iconst_2
     17  ldc <String 0"or "> [22]
     19  aastore
     20  dup
     21  iconst_3
     22  ldc <String "not "> [24]
     24  aastore
     25  dup
     26  iconst_4
     27  ldc <String "to "> [26]
     29  aastore
     30  dup
     31  iconst_5
     32  ldc <String ", that"> [28]
     34  aastore
     35  dup
     36  bipush 6
     38  ldc <String "is "> [30]
     40  aastore
     41  dup
     42  bipush 7
     44  ldc <String "the "> [32]
     46  aastore
     47  dup
     48  bipush 8
     50  ldc <String "question."> [34]
     52  aastore
     53  astore_1 [william]
     54  ldc <String ""> [36]
     56  astore_2 [quote]
     57  aload_1 [william]
     58  dup
     59  astore 6
     61  arraylength
     62  istore 5
     64  iconst_0
     65  istore 4
     67  goto 98
     70  aload 6
     72  iload 4
     74  aaload
     75  astore_3 [word]
     76  new java.lang.StringBuilder [38]
     79  dup
     80  aload_2 [quote]
     81  invokestatic java.lang.String.valueOf(java.lang.Object) : java.lang.String [40]
     84  invokespecial java.lang.StringBuilder(java.lang.String) [44]
     87  aload_3 [word]
     88  invokevirtual java.lang.StringBuilder.append(java.lang.String) : java.lang.StringBuilder [47]
     91  invokevirtual java.lang.StringBuilder.toString() : java.lang.String [51]
     94  astore_2 [quote]
     95  iinc 4 1
     98  iload 4
    100  iload 5
    102  if_icmplt 70
WJS
  • 36,363
  • 4
  • 24
  • 39
  • 1
    Yes, but I'm interested in the running time of simple string concatenation, not StringBuilder. I just want to know if the analysis in those manuals is correct or not. – iClaude Oct 09 '19 at 18:27
  • agreed - this is the expected optimization since around JDK 1.5. No need to do micro-optimizations on concats unless explicitly necessary. –  Oct 09 '19 at 18:27
  • @iClaude you don't get it - *simple concatenation is compiled to `StringBuilder` automatically*. It has been like that for a long time in javac; even if it weren't happening in javac, it would happen as HotSpot/JIT optimization etc. Your question makes no sense as it stands, but this answer is actually the solution to your XY problem. –  Oct 09 '19 at 18:29
  • 1
    Yes, I know that. But I want to know if the analysis of the running time of simple string concatenation is correct or not, supposing that StringBuilder is not used. To be clearer, I'm not interested in StringBuilder at all. – iClaude Oct 09 '19 at 18:32
  • 1
    @iClaude If StringBuilder is not used then what is? Your question goes from a concrete implementation to an abstraction. And the latter can be made as complex and inefficient as one wants. – WJS Oct 09 '19 at 18:33
  • 1
    Both, the `invokespecial java.lang.StringBuilder(java.lang.String)` and the `invokevirtual java.lang.StringBuilder.append(java.lang.String)` copy the contents of the string argument, and that’s happening in a loop. So the overall time complexity doesn’t change. It’s still as derived in [this answer](https://stackoverflow.com/a/58310567/2711488) with a worst case of O(n²). In fact, there’s even more copying going on than with `String quote = ""; for(String word: william) quote = quote.concat(word);` The takeaway is, don’t concat in a loop, even the `invokedynamic` approach doesn’t change that. – Holger Jun 01 '23 at 16:48