40

I'm analysing the complexity of my code. From what I found online, since strings are immutable in python, a concatenation of a string and a character should be O(len(string) + 1).

Now, here is my piece of code (simplified):

word = ""
for i in range(m):
    word = char_value + word
return word

The total time complexity should be:

(0+1) + (1+1) +...+ m = m(m+1)/2 = O(m^2)

Is this correct?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
cwbrd
  • 547
  • 1
  • 4
  • 10
  • What do you count: walclock time, number of operations? I doubt that concatenation of `m` strings is quadratic in `m`. –  May 10 '16 at 08:54
  • Number of operations, e.g. allocating a string of n characters should take n... – cwbrd May 10 '16 at 08:57
  • Why should allocating of a string of length `2m` take twice the time of allocating a string of length `m` ? –  May 10 '16 at 08:59
  • Of course it depends on how strings are instantiated in Python, I am considering allocating an array of n characters even though I don't actually know how it is done – cwbrd May 10 '16 at 09:02
  • 2
    @DisplayName: because characters need to be copied into the new string object each time. So concatenating 10 characters to another 10 characters takes order 20 steps to produce the new string. Do this in a loop, and you get quadratic behaviour. – Martijn Pieters May 10 '16 at 09:02
  • @PadriacCunningham: note that the concatenation is **reversed** here; the character is prepended. The optimisation in that post **does not apply here**. Which is why I generally recommend against relying on that, it is too easy to misunderstand when it is applied. – Martijn Pieters May 10 '16 at 09:15

1 Answers1

62

Yes, in your case*1 string concatenation requires all characters to be copied, this is a O(N+M) operation (where N and M are the sizes of the input strings). M appends of the same word will trend to O(M^2) time therefor.

You can avoid this quadratic behaviour by using str.join():

word = ''.join(list_of_words)

which only takes O(N) (where N is the total length of the output). Or, if you are repeating a single character, you can use:

word = m * char

You are prepending characters, but building a list first, then reversing it (or using a collections.deque() object to get O(1) prepending behaviour) would still be O(n) complexity, easily beating your O(N^2) choice here.


*1 As of Python 2.4, the CPython implementation avoids creating a new string object when using strA += strB or strA = strA + strB, but this optimisation is both fragile and not portable. Since you use strA = strB + strA (prepending) the optimisation doesn't apply.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • No, I cannot do either things. As I said, my code above is simplified, the characters I concat are different and putting them in a list would not optimise the code in the end. – cwbrd May 10 '16 at 09:10
  • 4
    @Generalbrus: Putting them in a list would avoid the quadratic behaviour, so that *definitely* optimises the performance. – Martijn Pieters May 10 '16 at 09:11
  • From the way I access the characters (they are in a trie), I would have to reverse the list I created before joining, so I think I'm better off operating on char arrays instead of strings in my case. Thanks for the tips though. – cwbrd May 10 '16 at 09:22
  • @Generalbrus: use `collections.deque` and append the items to the front then. – Martijn Pieters May 10 '16 at 09:24
  • @Generalbrus: or append and then reverse, that's still O(n). Avoiding quadratic behaviour is definitely worth it here. – Martijn Pieters May 10 '16 at 09:24
  • How does M appends of the same word trends to M^2? The number of chars copied at each iteration would be: (M+M) + (2M+M) + (3M+M)+…(MM + M) = 2M + 3M + 4M + … + M^2 = M(1+2+3+…+M) = M * M(M+1)/2. So, shouldn’t M concatenations of the same word of length M in Python using “+” be O(M^3)? – karahbit Feb 11 '23 at 16:38
  • @karahbit: I should not have used `M` for both the number of characters and the number of times you append in the loop. Only if the length of the word is equal to the number of times that you append is the complexity O(M^3), but **that's not the case here**. Keep the length of the appended text out of it, because that's a fixed amount (it's the average length of what you append), and only look at how many times you are appending. – Martijn Pieters Feb 11 '23 at 23:06