3

I've heard couple of times already that you shouldn't do string concatenation in a for loop as strings are immutable and so it would compute the concatenation as a new string instance and then reassign the identifier. So if the result has n characters the time complexity would be O(n^2)

Bad: runs in O(n^2). Iterative '+' concatenation.

letters = ""
for c in document:
    if c.isalpha():
        letters += c

Good: runs in O(n). Iterative append, with a final "".join()

document = ""
temp = []
for c in document:
    if c.isalpha(): 
        temp.append(c)
letters = "".join(temp)

At the same time I've read that

"Some later implementations of the Python interpreter have developed an optimization to allow such code to complete in linear time,.."

So the first solution should be fine too? Is it an optimization which is in the latest python builds?

smci
  • 32,567
  • 20
  • 113
  • 146
user1767754
  • 23,311
  • 18
  • 141
  • 164

2 Answers2

1

At first, you should write the most readable code for you; only if you have issues with the runtime, you should think of optimization:

letters = "".join(c for c in document if c.isalpha())

For current CPython implementations join is faster than '+'.

>>> def test():
...   s = ""
...   for x in range(1000):
...     s += 'x'
... 
>>> timeit.timeit(test)
157.9563412159987
>>> def test():
...   s = []
...   for x in range(1000):
...     s.append('x')
...   s = ''.join(s)
... 
>>> timeit.timeit(test)
147.74276081599965
Daniel
  • 42,087
  • 4
  • 55
  • 81
0

The key is some implementations. Not all. If you want to make sure your code runs fast on all implementations of python then use str.join. Depending on the size of your document, different approaches will be faster. However, "".join(...) is very pythonic and people will more quickly understand your intentions. So unless you have lots of small documents stick with str.join.

However, to get a 10-fold speed increase on both str.join and += then use str.translate. This solution is specific to removing individual characters though.

from string import digits
translation_table = str.maketrans("", "", digits) 
# first two args about translating characters, third is for removing characters
letters = document.translate(translation_table)

The reason for this speed increase is that python needs to create a new string for each character in the document. str.translate doesn't need to do this and so is much faster.

Dunes
  • 37,291
  • 7
  • 81
  • 97