0

I know that ''.join(list) is the preferred method to concatenate strings as opposed to say:

for x in list:
    s += x

My question is why is this much faster?

Also, what if I need to concatenate items that are not already in a list? Is it still faster to put them in a list just for the purpose of doing the ''.join(list)?

EDIT: This is different than the previously linked question because I'm specifically interested in knowing if the items are not in a list already, is it still recommended for performance reasons to put them in a list for the sole purpose of joining.

Nick Weseman
  • 1,502
  • 3
  • 16
  • 22
  • 1
    Generally, you want to avoid the quadratic behavior that will occur if you incrementally build a string using concatenation. Building a list of strings and using `str.join` guarantees linear behavior. Although recent verisions of CPython will optimize concatenation of strings, that is not guaranteed. You also have the performance improvements you get by pushing the looping down to the C level. – juanpa.arrivillaga May 11 '17 at 16:47

2 Answers2

3

This is faster because the join method gets to dive "under the surface" and use lower-level optimizations not available from the Python layer. The loop has to plod through the sequence generator and deal with each object in turn. Also, your loop has to build a new string on each iteration, a slow process. join gets to use mutable strings on the C layer or below.

If the objects aren't already in a list ... it depends on the application. However, I suspect that almost any such application will have to go through that loop-ish overhead somewhere just to form the list, so you'd lose some of the advantage of join, although the mutable string would still save time.

Prune
  • 76,765
  • 14
  • 60
  • 81
0

Yes, join is faster because it doesn't need to keep building new strings.

But you don't need a list to use join! You can give it any iterable, such as a generator expression:

''.join(x for x in lst if x != 'toss')

It appears that join is optimized when you use a list though. All of these are equivalent, but the one with a list comprehension is fastest.

>>> timeit("s=''.join('x' for i in range(200) if i!=47)")
15.870241802178043
>>> timeit("s=''.join(['x' for i in range(200) if i!=47])")
11.294011708363996
>>> timeit("s=''\nfor i in range(200):\n if i!=47:\n  s+='x'")
16.86279364279278
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622