3

I've read this reply which explains that CPython has an optimization to do an in-place append without copy when appending to a string using a = a + b or a += b. I've also read this PEP8 recommendation:

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such). For example, do not rely on CPython’s efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn’t present at all in implementations that don’t use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

So if I understand correctly, instead of doing a += b + c in order to trigger this CPython optimization which does the replacement in-place, the proper way is to call a = ''.join([a, b, c]) ?

But then why is this form with join significantly slower than the form in += in this example (In loop1 I'm using a = a + b + c on purpose in order to not trigger the CPython optimization)?

import os
import time

if __name__ == "__main__":
    start_time = time.time()
    print("begin: %s " % (start_time))
    s = ""
    for i in range(100000):
        s = s + str(i) + '3'
    time1 = time.time()
    print("end loop1: %s " % (time1 - start_time))

    s2 = ""
    for i in range(100000):
        s2 += str(i) + '3'

    time2 = time.time()
    print("end loop2: %s " % (time2 - time1))

    s3 = ""
    for i in range(100000):
        s3 = ''.join([s3, str(i), '3'])

    time3 = time.time()
    print("end loop3: %s " % (time3 - time2))

The results show join is significantly slower in this case:

~/testdir$ python --version
Python 3.10.6
~/testdir$ python concatenate.py 
begin: 1675268345.0761461 
end loop1: 3.9019 
end loop2: 0.0260 
end loop3: 0.9289 

Is my version with join wrong?

Étienne
  • 4,773
  • 2
  • 33
  • 58
  • 5
    How expensive is it to create `[s3, str(i), '3']` alone? – Friedrich Feb 01 '23 at 16:28
  • 1
    BTW, you should look into using the `timeit` module for benchmarking. – Barmar Feb 01 '23 at 16:35
  • 1
    Replying to my own question: on my machine, list creation accounted for about 15% of the time. Not enough to explain the differences away. – Friedrich Feb 01 '23 at 16:37
  • 1
    `s2` uses `+=`, so you are still getting the optimization. The idea with `join` is *not* to call it many times with small lists, but to build up one list and pass it to *one* call of `join`. – chepner Feb 01 '23 at 16:37
  • join internally does something like `arg[0]++arg[1]++...`. So instead of joining 3 items together like with `a+b+c`, 5 items are joined together, like `a+''+b+''+c`, therefore, when using join instead of the `+` operator, it internally does `len(arg)-1` additional calculations. When doing a single operation, it doesn't really matter, but when doing 100000 iterations like in your program, 200000 additional operations are done, and it makes a difference. – HelpfulHelper Feb 01 '23 at 16:38
  • 1
    I think it's worth mentioning that the PEP recommendation isn't to ensure you have the fastest code, but rather that your code doesn't experience non-linear timing growths depending on which implementation (cpython, jython, etc) you use. It's a middle ground between non-linear timings across implementations that may lead to unexpected slow performance, and the fastest possible execution because of a particular implementations efficiencies. – JNevill Feb 01 '23 at 16:43
  • @HelpfulHelper Are you suggesting we should expect `join()` to be slow as a result of extra operations? – JonSG Feb 01 '23 at 17:00
  • @chepner s2 indeed uses `+=` and is getting the optimization. This is fully intentional, since s2 benchmarks the optimization. s1 is the completely unoptimized case. – Étienne Feb 01 '23 at 21:37
  • @JonSG yes, `join()` is slower because it does more operations – HelpfulHelper Feb 02 '23 at 17:10
  • @HelpfulHelper your assumption is incorrect. `join()` is by far the fasted method. Feel free to try the code below. – JonSG Feb 02 '23 at 17:11

2 Answers2

7

In "loop3" you bypass a lot of the gain of join() by continuously calling it in an unneeded way. It would be better to build up the full list of characters then join() once.

Check out:

import time

iterations = 100_000

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s = s + "." + '3'
end_time = time.time()
print("end loop1: %s " % (end_time - start_time))
##----------------

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s += "." + '3'
end_time = time.time()
print("end loop2: %s " % (end_time - start_time))
##----------------

##----------------
s = ""
start_time = time.time()
for i in range(iterations):
    s = ''.join([s, ".", '3'])
end_time = time.time()
print("end loop3: %s " % (end_time - start_time))
##----------------

##----------------
s = []
start_time = time.time()
for i in range(iterations):
    s.append(".")
    s.append("3")
s = "".join(s)
end_time = time.time()
print("end loop4: %s " % (end_time - start_time))
##----------------

##----------------
s = []
start_time = time.time()
for i in range(iterations):
    s.extend((".", "3"))
s = "".join(s)
end_time = time.time()
print("end loop5: %s " % (end_time - start_time))
##----------------

Just to be clear, you can run this with:

iterations = 10_000_000

If you like, just be sure to remove "loop1" and "loop3" as they get dramatically slower after about 300k.

When I run this with 10 million iterations I see:

end loop2: 16.977502584457397 
end loop4: 1.6301295757293701 
end loop5: 1.0435805320739746

So, clearly there is a way to use join() that is fast :-)

ADDENDUM:

@Étienne has suggested that making the string to append longer reverses the findings and that optimization of loop2 does not happen unless it is in a function. I do not see the same.

import time

iterations = 10_000_000
string_to_append = "345678912"

def loop2(iterations):
    s = ""
    for i in range(iterations):
        s += "." + string_to_append
    return s

def loop4(iterations):
    s = []
    for i in range(iterations):
        s.append(".")
        s.append(string_to_append)
    return "".join(s)

def loop5(iterations):
    s = []
    for i in range(iterations):
        s.extend((".", string_to_append))
    return "".join(s)

##----------------
start_time = time.time()
s = loop2(iterations)
end_time = time.time()
print("end loop2: %s " % (end_time - start_time))
##----------------

##----------------
start_time = time.time()
s = loop4(iterations)
end_time = time.time()
print("end loop4: %s " % (end_time - start_time))
##----------------

##----------------
start_time = time.time()
s = loop5(iterations)
end_time = time.time()
print("end loop5: %s " % (end_time - start_time))
##----------------

On python 3.10 and 3.11 the results are similar. I get results like the following:

end loop2: 336.98531889915466 
end loop4: 1.0211727619171143 
end loop5: 1.1640543937683105

that continue to suggest to me that join() is overwhelmingly faster.

JonSG
  • 10,542
  • 2
  • 25
  • 36
  • 2
    this is a very memory-expensive program. the list `s` would have 200000 entries at the end. – HelpfulHelper Feb 01 '23 at 16:43
  • Computing is full of time/space tradeoffs. In this toy example it is not a factor, but it it ever became a factor then one could choose to optimize space rather than time. The question however was about join being slow and it is not. – JonSG Feb 01 '23 at 16:49
  • 1
    Even with 500k iterations loop 4 and 5 takes only 0.1s and loop 3 takes 30s ! Impressive. – Aymen Feb 01 '23 at 16:50
  • It's worth noting that I was getting similar results to those mentioned in this answer with python 3.10.6. The optimization with "+=" does not work at all with python 3.11.2 unless the code is in a function, and loop2 is as slow as loop1 with this version unless you put the code in a function!! ( see https://github.com/python/cpython/issues/99862 ) – Étienne Feb 14 '23 at 13:45
  • @JonSG the real-world code I was benchmarking was still not faster with extend() after I applied the code recommended in your answer, so I played a bit with your example and noticed that loop5 is slower than loop2 when the length of the string being appended increases (I replaced "3" with "345678912" and then loop2 is faster than loop5 on my machine, with 10 million iterations). Did you expect this result? – Étienne Feb 14 '23 at 14:01
  • @Étienne Yes, that would not be the result I expected, so I tried it out I will update my answer but I **do not** see what you see. – JonSG Feb 14 '23 at 15:29
  • @Étienne With respect to that git issue, did you see the comment from the guy on the python searing committee https://github.com/python/cpython/issues/99862#issuecomment-1331246198 that also explicitly states that `append()` is the correct way to proceed? – JonSG Feb 14 '23 at 15:43
  • I've not claimed that putting the code in a function reverses the finding. Loop2 (the version with +=) is not optimized at all unless it is in a function with python 3.11. Your updated script does not demonstrate this, since the code in a function will indeed behave the same on 3.10 and 3.11. I think you misunderstood my comment. I also agree that append/extend is the way to go. – Étienne Feb 14 '23 at 15:43
  • @Étienne I apologize for my misinterpretation about functions however am I correct in that you see slower times for `loop5` vs `loop2` when the size of the string is increased to "345678912"? – JonSG Feb 14 '23 at 15:46
  • 1
    @JonSG yes, if I copy-paste your answer and replace the variable "string_to_append" with the string-litteral "345678912" in the 3 loops, I get those results: `$ python test123.py end loop2: 0.6393730640411377 end loop4: 0.9040520191192627 end loop5: 0.6521048545837402` Your version which uses the variable "string_to_append" has different results: `$ python test123.py end loop2: 1.0737519264221191 end loop4: 0.9513623714447021 end loop5: 0.938453197479248 ` I was just curious about why that is, but it does not change the general validity of your answer, thanks. – Étienne Feb 14 '23 at 15:54
  • 1
    @Étienne Interesting as you can see I get a wildly different result. Perhaps it is because my python 3.11 is Python 3.11.0? – JonSG Feb 14 '23 at 15:56
2

This is just to add the results from @JonSG answer with different python implementations I have available, posted as an answer, because cannot use formatting in an comment.

The only modification is that I was using 1M iterations and for "local" I've wrapped whole test in test() function, doing it inside 'if name == "main":' block, doesn't seem to help with 3.11 regression Étienne mentioned. With 3.12.0a5 I'm seeing similar difference between local and global s variable, but it's a lot faster.

loop 3.10.10 3.10.10 3.11.2 3.11.2 3.12.0a5 3.12.0a5 pypy 3.9.16 pypy 3.9.16
global local global local global local global local
a = a + b + c 71.04 71.76 92.55 90.57 91.24 92.08 120.05 97.94
a += b + c 0.38 0.20 26.57 0.21 24.06 0.03 108.98 89.62
a = ''.join(a, b, c) 23.26 21.96 25.31 24.60 23.94 23.79 94.04 90.88
a.append(b);a.append(c) 0.50 0.38 0.35 0.23 0.0692 0.0334 0.12 0.12
a.extend((b, c)) 0.35 0.27 0.29 0.19 0.0684 0.0343 0.10 0.10
JaMa
  • 116
  • 3