Why does "".join() appear to be slower than +=

Question

Despite this question Why is ''.join() faster than += in Python? and it's answers and this great explanation of the code behind the curtain: https://paolobernardi.wordpress.com/2012/11/06/python-string-concatenation-vs-list-join/
My tests suggest otherwise and I am baffled.
Am I doing something simple, incorrectly? I'll admit that I'm fudging the creation of x a bit but I don't see how that would affect the outcome.

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y+=x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524757.681939, 1473524757.68521, '=', 0.0032711029052734375)

import time
x="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
for i in range(10000):
    y=y+x
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524814.544177, 1473524814.547544, '=', 0.0033669471740722656)

import time
x=10000*"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473524861.949515, 1473524861.978755, '=', 0.029239892959594727)

As can be seen the "".join() is much slower and yet we're told that it's meant to be quicker.
These values are very similar in both python2.7 and python3.4

Edit: Ok fair enough.

The "one huge string" thing is the kicker.

import time
x=[]
for i in range(10000):
    x.append("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
y=""
t1 = (time.time())
y= "".join(x)
t2 = (time.time())
#print (y)
print (t1,t2,"=",t2-t1)

(1473526344.55748, 1473526344.558409, '=', 0.0009288787841796875)

An order of magnitude quicker. Mea Culpa!

Using `time.time()` is hugely inaccurate as it is sensitive to OS interrupts and other programs running on your system. Use the `timeit` module to run multiple (thousands) of trial runs instead. — Martijn Pieters, Sep 10 '16 at 16:36
Also, you called `''.join()` on **one huge string**, not a list. — Martijn Pieters, Sep 10 '16 at 16:37

Martijn Pieters · Accepted Answer · 2016-09-10T16:44:04.483

You called ''.join() on one huge string, not a list (multiplying a string produces a larger string). This forces str.join() to iterate over that huge string, joining 74k individual 'x' characters. In other words, your second test does 74 times more work than your first.

To conduct a fair trial, you need to start with the same inputs for both, and use the timeit module to reduce the influence of garbage collection and other processes on your system.

That means both approaches need to work from a list of strings (your assignment examples rely on repeatedly adding a string literal, stored as a constant):

from timeit import timeit

testlist = ['x' * 74 for _ in range(100)]

def strjoin(testlist):
    return ''.join(testlist)

def inplace(testlist):
    result = ''
    for element in testlist:
        result += element
    return result

def concat(testlist):
    result = ''
    for element in testlist:
        result = result + element
    return result

for f in (strjoin, inplace, concat):
    timing = timeit('f(testlist)', 'from __main__ import f, testlist',
                    number=100000)
    print('{:>7}: {}'.format(f.__name__, timing))

On my Macbook Pro, on Python 3.5, this produces:

strjoin: 0.09923043003072962
inplace: 1.0032496969797648
 concat: 1.0027298880158924

On 2.7, I get:

strjoin: 0.118290185928
inplace: 0.85814499855
 concat: 0.867822885513

str.join() is still the winner here.

Whoops, didn't see the answer, I was busy sorting out the "one huge string". — Rolf of Saxony, Sep 10 '16 at 16:56

score 2 · Answer 2 · answered Sep 10 '16 at 16:42

You are not comparing the same operation because your first operation added the long string every iteration while join added every item of the string seperatly. (See also @MartijnPieters answer)

If I run a comparison I get completly different timings suggesting that str.join is much faster:

x = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

def join_inplace_add(y, x, num):
    for _ in range(num):
        y += x
    return y

def join_by_join(x, num):
    return ''.join([x for _ in range(num)])

%timeit join_by_join('', x, 1000)
# 10000 loops, best of 3: 91 µs per loop
%timeit join_inplace_add(x, 1000)
# 1000 loops, best of 3: 325 µs per loop

Thanks for taking the time to point out my stupid error, I really should have known better. — Rolf of Saxony, Sep 10 '16 at 17:03

Why does "".join() appear to be slower than +=

2 Answers2