Python String Concatenation in for loop long strings

Question

I've heard couple of times already that you shouldn't do string concatenation in a for loop as strings are immutable and so it would compute the concatenation as a new string instance and then reassign the identifier. So if the result has n characters the time complexity would be O(n^2)

Bad: runs in O(n^2). Iterative '+' concatenation.

letters = ""
for c in document:
    if c.isalpha():
        letters += c

Good: runs in O(n). Iterative append, with a final "".join()

document = ""
temp = []
for c in document:
    if c.isalpha(): 
        temp.append(c)
letters = "".join(temp)

At the same time I've read that

"Some later implementations of the Python interpreter have developed an optimization to allow such code to complete in linear time,.."

So the first solution should be fine too? Is it an optimization which is in the latest python builds?

Most pythonistas would use the comprehension: `letters = ''.join([c for c in document if c.isalpha()])` — wim, Sep 21 '16 at 16:52
@StefanPochmann sorry my bad, letters should be outside of the loop. copy paste error. Corrected both snippets. — user1767754, Sep 21 '16 at 16:59
@user1767754 First one still has syntax error in first line. And a strange comment. — Stefan Pochmann, Sep 21 '16 at 17:07

Daniel · Accepted Answer · 2016-09-21T17:39:06.337

1

At first, you should write the most readable code for you; only if you have issues with the runtime, you should think of optimization:

letters = "".join(c for c in document if c.isalpha())

For current CPython implementations join is faster than '+'.

>>> def test():
...   s = ""
...   for x in range(1000):
...     s += 'x'
... 
>>> timeit.timeit(test)
157.9563412159987
>>> def test():
...   s = []
...   for x in range(1000):
...     s.append('x')
...   s = ''.join(s)
... 
>>> timeit.timeit(test)
147.74276081599965

edited Sep 21 '16 at 17:39

answered Sep 21 '16 at 16:50

Daniel

42,087
4
55
81

list comp is better than genex here – wim Sep 21 '16 at 16:52
it's pretty the same. – Daniel Sep 21 '16 at 16:54
My question might be not clear, but i wanted to know if python is optimized for doing += or not. – user1767754 Sep 21 '16 at 17:00
That depends on the implementation. – Daniel Sep 21 '16 at 17:03

score 0 · Answer 2 · answered Sep 21 '16 at 17:40

The key is some implementations. Not all. If you want to make sure your code runs fast on all implementations of python then use str.join. Depending on the size of your document, different approaches will be faster. However, "".join(...) is very pythonic and people will more quickly understand your intentions. So unless you have lots of small documents stick with str.join.

However, to get a 10-fold speed increase on both str.join and += then use str.translate. This solution is specific to removing individual characters though.

from string import digits
translation_table = str.maketrans("", "", digits) 
# first two args about translating characters, third is for removing characters
letters = document.translate(translation_table)

The reason for this speed increase is that python needs to create a new string for each character in the document. str.translate doesn't need to do this and so is much faster.

Python String Concatenation in for loop long strings

Bad: runs in O(n^2). Iterative '+' concatenation.

Good: runs in O(n). Iterative append, with a final "".join()

2 Answers2