I know that string concatenation using + operator is slow because string is immutable type and each += creates a new copy of string. I am trying to estimate the asymptotic upper-bound time complexity with Big-Oh notation. So I analyzed the following for loop.
for c in s:
out_str += c
In the first iteration, out_str has 0 character and c has 1 char.
In the second iteration, out_str has 1 char. and c has 1 char.
In the k th iteration, out_str has k-1 char. and c has 1 char.
k th iteration requires copying k-1 + 1 = k characters and bind this new string object to out_str. So for a string s with length n, the for loop requires 1 + 2 + ... + n = O(n^2) copying operations. This makes the for loop O(n^2).
But when I tried to simulate this, I get O(n) for the complexity of the for loop. I am attaching my test code.
import random
import time
import matplotlib.pyplot as plt
import numpy
def method1(s):
start = time.time()
out_str = ''
for c in s:
out_str += c
end = time.time()
delta = (end - start) * 1000
print "method1: time in ms", delta
return delta
def generate_string(length):
# generate random string of length length
s = ''.join([`x` for x in numpy.random.randint(0,9,length)])
return s
times = []
N = range(100, 5000000, 50000) # string sizes
for n in N:
input_string = generate_string(n)
L = []
# average 50 iterations of the same string length
for i in range(50):
L.append(method1(input_string))
times.append(numpy.mean(L))
plt.plot(N,times)
plt.show()
Here is the resulting plot. The horizontal axis is string length (n) and vertical axis is the elapsed time in milliseconds.