Special case when += for string concatenation is more efficient than =

Question

I have this code using python 3.11:

import timeit

code_1 = """
initial_string = ''
for i in range(10000):
    initial_string = initial_string + 'x' + 'y'
"""

code_2 = """
initial_string = ''
for i in range(10000):
    initial_string += 'x' + 'y'
"""

time_1 = timeit.timeit(code_1, number=100)
time_2 = timeit.timeit(code_2, number=100)

print(time_1)
# 0.5770808999950532
print(time_2)
# 0.08363639999879524

Why += is more efficient in this case? As far as I know, there is the same number of concatenation, and the order of execution doesn't change the result.

Since strings are immutable, it's not because of inplace shinanigans, and the only thing I found about string concat is about .join efficiency, but I don't want the most efficient, just understand why += seems more efficient than =.

With this code, performances between forms almost equals:

import timeit

code_1 = """
initial_string = ''
for i in range(10000):
    initial_string = initial_string + 'x'
"""

code_2 = """
initial_string = ''
for i in range(10000):
    initial_string += 'x'
"""

time_1 = timeit.timeit(code_1, number=100)
time_2 = timeit.timeit(code_2, number=100)

print(time_1)
# 0.07953230000566691
print(time_2)
# 0.08027460001176223

I noticed a difference using different Python version ('x' + 'y' form):

Python 3.7 to 3.9:

print(time_1)
# ~0.6
print(time_2)
# ~0.3

Python 3.10:

print(time_1)
# ~1.7
print(time_2)
# ~0.8

Python 3.11 for comparison:

print(time_1)
# ~0.6
print(time_2)
# ~0.1

Similar but not answering the question: How is the s=s+c string concat optimization decided?

If s is a string, then s = s + 'c' might modify the string in place, while t = s + 'c' can't. But how does the operation s + 'c' know which scenario it's in?

In a nutshell: Optimization occur when s = s + 'c', not when t = s + 'c' because python need to keep a ref to the first string and can't concatenate in-place.

Here, we are always assigning using simple assignment or augmented assignment to the original string, so in-place concatenation should apply in both cases.

Random guess: due to the `+=`, Python can be more confident that the LHS can be thrown away immediately, and therefore mutating it is possible. Not sure if this optimization has been implemented in CPython. — Mateen Ulhaq, Mar 18 '23 at 07:10
In the second piece of code the compiler optimises `'x' + 'y'` to `'xy'` — Iain Shelvington, Mar 18 '23 at 07:11
Going off what @IainShelvington says, perhaps try `initial_string = initial_string + ("x" + "y")` — Mateen Ulhaq, Mar 18 '23 at 07:12
"Since strings are immutable, it's not because of inplace shinanigans" - but it is! CPython has a weird optimization that breaks string immutability here. Adding *3* strings with `initial_string = initial_string + 'x' + 'y'` breaks the optimization. — user2357112, Mar 18 '23 at 08:02
@user2357112 it is optimization shinanigan, but why do you consider it "inplace" ? — Dorian Turba, Mar 18 '23 at 08:30
@DorianTurba: Because it genuinely tries to do the concatenation in place - [it `realloc`s the first string's buffer](https://stackoverflow.com/questions/62621386/confused-why-after-2nd-evaluation-of-operator-of-immutable-string-does-not-ch/62621629#62621629). I'm trying to find a dupe target that mentions what happens with the `initial_string = initial_string + 'x' + 'y'` - I'm pretty sure I've seen a good dupe target before, but I'm having trouble locating it. — user2357112, Mar 18 '23 at 08:34
Couldn't find the dupe target. Maybe it got deleted. Writing an answer instead... — user2357112, Mar 18 '23 at 08:38
Does this answer your question? [How is the s=s+c string concat optimization decided?](https://stackoverflow.com/questions/69079181/how-is-the-s-sc-string-concat-optimization-decided) — S.B, Mar 18 '23 at 11:36
@S.B unfortunately, I don’t think so because we are in the case where initial_string is the only ref to the str. — Dorian Turba, Mar 18 '23 at 11:42

user2357112 · Accepted Answer · 2023-03-18T09:01:01.523

For a while now, CPython has had an optimization that tries to perform string concatenation in place where possible. The details vary between Python versions, sometimes a lot - for example, it doesn't work for globals on Python 3.11, and it used to be specific to bytestrings on Python 2, but it's specific to Unicode strings on Python 3.

On Python 3.10, the optimization starts in unicode_concatenate, and it eventually hits a PyObject_Realloc inside resize_compact or resize_inplace, attempting to resize the left-hand operand in place.

One fairly consistent thing about the optimization across Python versions is that it only works if the left-hand side of the concatenation has no other references, or if the only reference is a variable that the result of the concatenation will be assigned to. In your slow case:

initial_string = initial_string + 'x' + 'y'

the LHS of initial_string + 'x' is initial_string, and you're not going to assign the result back to initial_string - you're going to concatenate 'y' to the result first. Thus, the optimization can't kick in for initial_string + 'x'. (It kicks in for the + 'y' part, though.)

For your other cases, the optimization works. For example, in

initial_string += 'x' + 'y'

instead of concatenating initial_string and 'x' and then appending 'y', you concatenate 'x' and 'y' and then concatenate initial_string and the result. The changed order of operations means that you're assigning the result of the initial_string concatenation back to initial_string, so the optimization applies. (Also the 'x' + 'y' gets constant-folded away, which helps a little but isn't the primary cause of the performance difference.)

If it doesn't apply to Python 3.11 globals, why such difference in my case (I'm using 3.11) — Dorian Turba, Mar 18 '23 at 20:33
@DorianTurba: You're not using globals. `timeit` puts the timed code in a function. — user2357112, Mar 18 '23 at 23:30

Roland Smith · Answer 2 · 2023-03-18T08:26:26.403

Using the dis module, you can see what happens:

In [1]: import dis

In [2]: code_1 = """
   ...: initial_string = ''
   ...: for i in range(10000):
   ...:     initial_string = initial_string + 'x' + 'y'
   ...: """
Out[2]: "\ninitial_string = ''\nfor i in range(10000):\n    initial_string = initial_string + 'x' + 'y'\n"

In [3]: dis.dis(code_1)
  2           0 LOAD_CONST               0 ('')
              2 STORE_NAME               0 (initial_string)

  3           4 LOAD_NAME                1 (range)
              6 LOAD_CONST               1 (10000)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                16 (to 30)
             14 STORE_NAME               2 (i)

  4          16 LOAD_NAME                0 (initial_string)
             18 LOAD_CONST               2 ('x')
             20 BINARY_ADD
             22 LOAD_CONST               3 ('y')
             24 BINARY_ADD
             26 STORE_NAME               0 (initial_string)
             28 JUMP_ABSOLUTE           12
        >>   30 LOAD_CONST               4 (None)
             32 RETURN_VALUE

In [4]: code_2 = """
   ...: initial_string = ''
   ...: for i in range(10000):
   ...:     initial_string += 'x' + 'y'
   ...: """
Out[4]: "\ninitial_string = ''\nfor i in range(10000):\n    initial_string += 'x' + 'y'\n"

In [5]: dis.dis(code_2)
  2           0 LOAD_CONST               0 ('')
              2 STORE_NAME               0 (initial_string)

  3           4 LOAD_NAME                1 (range)
              6 LOAD_CONST               1 (10000)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                12 (to 26)
             14 STORE_NAME               2 (i)

  4          16 LOAD_NAME                0 (initial_string)
             18 LOAD_CONST               2 ('xy')
             20 INPLACE_ADD
             22 STORE_NAME               0 (initial_string)
             24 JUMP_ABSOLUTE           12
        >>   26 LOAD_CONST               3 (None)
             28 RETURN_VALUE

(Note that this is on CPython 3.9. On 3.11 the bytecode is different, although the speed on my machine is similar.)

In the case of code_2, the x and y constants are folded, and one INPLACE_ADD is used instead of two BINARY_ADD opcodes.

So basically, four bytecodes are replaced by two.

In your second block of code, the dis output is almost identical, except for one BINARY_ADD/INPLACE_ADD.

In [8]: code_1 = """
   ...: initial_string = ''
   ...: for i in range(10000):
   ...:     initial_string = initial_string + 'x'
   ...: """
   ...: 
   ...: code_2 = """
   ...: initial_string = ''
   ...: for i in range(10000):
   ...:     initial_string += 'x'
   ...: """
Out[8]: "\ninitial_string = ''\nfor i in range(10000):\n    initial_string += 'x'\n"

In [9]: dis.dis(code_1)
  2           0 LOAD_CONST               0 ('')
              2 STORE_NAME               0 (initial_string)

  3           4 LOAD_NAME                1 (range)
              6 LOAD_CONST               1 (10000)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                12 (to 26)
             14 STORE_NAME               2 (i)

  4          16 LOAD_NAME                0 (initial_string)
             18 LOAD_CONST               2 ('x')
             20 BINARY_ADD
             22 STORE_NAME               0 (initial_string)
             24 JUMP_ABSOLUTE           12
        >>   26 LOAD_CONST               3 (None)
             28 RETURN_VALUE

In [10]: dis.dis(code_2)
  2           0 LOAD_CONST               0 ('')
              2 STORE_NAME               0 (initial_string)

  3           4 LOAD_NAME                1 (range)
              6 LOAD_CONST               1 (10000)
              8 CALL_FUNCTION            1
             10 GET_ITER
        >>   12 FOR_ITER                12 (to 26)
             14 STORE_NAME               2 (i)

  4          16 LOAD_NAME                0 (initial_string)
             18 LOAD_CONST               2 ('x')
             20 INPLACE_ADD
             22 STORE_NAME               0 (initial_string)
             24 JUMP_ABSOLUTE           12
        >>   26 LOAD_CONST               3 (None)
             28 RETURN_VALUE

But then look at the second snippet, where `initial_string = initial_string + 'x'` actually finishes *faster* than `initial_string += 'x'`. That's not consistent with the `BINARY_ADD`/`INPLACE_ADD` explanation here. — user2357112, Mar 18 '23 at 08:06
`initial_string + ('x' + 'y')` give me similar performances than `initial_string += 'x' + 'y'`. With a temporary value breaking in-place optimization, performances drops similar to `initial_string = initial_string + 'x' + 'y'`. — Dorian Turba, Mar 18 '23 at 20:28

Special case when += for string concatenation is more efficient than =

2 Answers2