I've read this reply which explains that CPython has an optimization to do an in-place append without copy when appending to a string using a = a + b
or a += b
. I've also read this PEP8 recommendation:
Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such). For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
So if I understand correctly, instead of doing a += b + c
in order to trigger this CPython optimization which does the replacement in-place, the proper way is to call a = ''.join([a, b, c])
?
But then why is this form with join
significantly slower than the form in +=
in this example (In loop1 I'm using a = a + b + c
on purpose in order to not trigger the CPython optimization)?
import os
import time
if __name__ == "__main__":
start_time = time.time()
print("begin: %s " % (start_time))
s = ""
for i in range(100000):
s = s + str(i) + '3'
time1 = time.time()
print("end loop1: %s " % (time1 - start_time))
s2 = ""
for i in range(100000):
s2 += str(i) + '3'
time2 = time.time()
print("end loop2: %s " % (time2 - time1))
s3 = ""
for i in range(100000):
s3 = ''.join([s3, str(i), '3'])
time3 = time.time()
print("end loop3: %s " % (time3 - time2))
The results show join
is significantly slower in this case:
~/testdir$ python --version
Python 3.10.6
~/testdir$ python concatenate.py
begin: 1675268345.0761461
end loop1: 3.9019
end loop2: 0.0260
end loop3: 0.9289
Is my version with join
wrong?