I have this code using python 3.11:
import timeit
code_1 = """
initial_string = ''
for i in range(10000):
initial_string = initial_string + 'x' + 'y'
"""
code_2 = """
initial_string = ''
for i in range(10000):
initial_string += 'x' + 'y'
"""
time_1 = timeit.timeit(code_1, number=100)
time_2 = timeit.timeit(code_2, number=100)
print(time_1)
# 0.5770808999950532
print(time_2)
# 0.08363639999879524
Why +=
is more efficient in this case?
As far as I know, there is the same number of concatenation, and the order of execution doesn't change the result.
Since strings are immutable, it's not because of inplace shinanigans, and the only thing I found about string concat is about .join
efficiency, but I don't want the most efficient, just understand why +=
seems more efficient than =
.
With this code, performances between forms almost equals:
import timeit
code_1 = """
initial_string = ''
for i in range(10000):
initial_string = initial_string + 'x'
"""
code_2 = """
initial_string = ''
for i in range(10000):
initial_string += 'x'
"""
time_1 = timeit.timeit(code_1, number=100)
time_2 = timeit.timeit(code_2, number=100)
print(time_1)
# 0.07953230000566691
print(time_2)
# 0.08027460001176223
I noticed a difference using different Python version ('x' + 'y'
form):
Python 3.7 to 3.9:
print(time_1)
# ~0.6
print(time_2)
# ~0.3
Python 3.10:
print(time_1)
# ~1.7
print(time_2)
# ~0.8
Python 3.11 for comparison:
print(time_1)
# ~0.6
print(time_2)
# ~0.1
Similar but not answering the question: How is the s=s+c string concat optimization decided?
If s is a string, then s = s + 'c' might modify the string in place, while t = s + 'c' can't. But how does the operation s + 'c' know which scenario it's in?
In a nutshell: Optimization occur when s = s + 'c'
, not when t = s + 'c'
because python need to keep a ref to the first string and can't concatenate in-place.
Here, we are always assigning using simple assignment or augmented assignment to the original string, so in-place concatenation should apply in both cases.