2

I know that string concatenation using + operator is slow because string is immutable type and each += creates a new copy of string. I am trying to estimate the asymptotic upper-bound time complexity with Big-Oh notation. So I analyzed the following for loop.

for c in s:
    out_str += c

In the first iteration, out_str has 0 character and c has 1 char.
In the second iteration, out_str has 1 char. and c has 1 char.
In the k th iteration, out_str has k-1 char. and c has 1 char.

k th iteration requires copying k-1 + 1 = k characters and bind this new string object to out_str. So for a string s with length n, the for loop requires 1 + 2 + ... + n = O(n^2) copying operations. This makes the for loop O(n^2).

But when I tried to simulate this, I get O(n) for the complexity of the for loop. I am attaching my test code.

import random
import time
import matplotlib.pyplot as plt
import numpy

def method1(s):
    start = time.time()
    out_str = ''
    for c in s:
        out_str += c
    end = time.time()
    delta = (end - start) * 1000
    print "method1: time in ms", delta
    return delta

def generate_string(length):
# generate random string of length length
    s = ''.join([`x` for x in numpy.random.randint(0,9,length)])
    return s

times = []
N = range(100, 5000000, 50000) # string sizes

for n in N:
    input_string = generate_string(n)
    L = []
    # average 50 iterations of the same string length
    for i in range(50):
        L.append(method1(input_string))
    times.append(numpy.mean(L))

plt.plot(N,times)
plt.show()

Here is the resulting plot. The horizontal axis is string length (n) and vertical axis is the elapsed time in milliseconds. https://i.stack.imgur.com/HxGq3.png

pzp
  • 6,249
  • 1
  • 26
  • 38
zcadqe
  • 937
  • 2
  • 13
  • 20

0 Answers0