138

A common antipattern in Python is to concatenate a sequence of strings using + in a loop. This is bad because the Python interpreter has to create a new string object for each iteration, and it ends up taking quadratic time. (Recent versions of CPython can apparently optimize this in some cases, but other implementations can't, so programmers are discouraged from relying on this.) ''.join is the right way to do this.

However, I've heard it said (including here on Stack Overflow) that you should never, ever use + for string concatenation, but instead always use ''.join or a format string. I don't understand why this is the case if you're only concatenating two strings. If my understanding is correct, it shouldn't take quadratic time, and I think a + b is cleaner and more readable than either ''.join((a, b)) or '%s%s' % (a, b).

Is it good practice to use + to concatenate two strings? Or is there a problem I'm not aware of?

Community
  • 1
  • 1
Taymon
  • 24,950
  • 9
  • 62
  • 84
  • Its neater and you have more control to not do concatenation. BUT its slightly slower, string bashing trade off :P – Jakob Bowyer Apr 06 '12 at 12:46
  • Are you saying `+` is faster or slower? And why? – Taymon Apr 06 '12 at 12:59
  • 1
    + is faster, `In [2]: %timeit "a"*80 + "b"*80` `1000000 loops, best of 3: 356 ns per loop` `In [3]: %timeit "%s%s" % ("a"*80, "b"*80)` `1000000 loops, best of 3: 907 ns per loop` – Jakob Bowyer Apr 06 '12 at 13:21
  • @JakobBowyer That's an unfair comparison. You should initialise the a string and the b string outside of the main test for a better comparison. – Dunes Apr 06 '12 at 13:28
  • 4
    `In [3]: %timeit "%s%s" % (a, b) 1000000 loops, best of 3: 590 ns per loop` `In [4]: %timeit a + b 10000000 loops, best of 3: 147 ns per loop` – Jakob Bowyer Apr 06 '12 at 16:00
  • 1
    @JakobBowyer and others: The "string concatenation is bad" argument has _almost_ nothing to do with speed, but taking advantage of automatic type conversion with `__str__`. See my answer for examples. – Izkata Apr 06 '12 at 17:43
  • If you do things the easy way, the Efficiency Monster will get you. Some people think EM is not real; he lurks. – David Rivers Dec 19 '13 at 17:02
  • In Python 3.6 [the literal string interpolation / format strings](https://www.python.org/dev/peps/pep-0498/) [will be faster than either of them](http://stackoverflow.com/a/38362140/918959), in the common cases. – Antti Haapala -- Слава Україні Jul 13 '16 at 22:56

8 Answers8

132

There is nothing wrong in concatenating two strings with +. Indeed it's easier to read than ''.join([a, b]).

You are right though that concatenating more than 2 strings with + is an O(n^2) operation (compared to O(n) for join) and thus becomes inefficient. However this has not to do with using a loop. Even a + b + c + ... is O(n^2), the reason being that each concatenation produces a new string.

CPython2.4 and above try to mitigate that, but it's still advisable to use join when concatenating more than 2 strings.

seriousdev
  • 7,519
  • 8
  • 45
  • 52
ggozad
  • 13,105
  • 3
  • 40
  • 49
  • 6
    @Mutant: `.join` takes an iterable, so both `.join([a,b])` and `.join((a,b))` are valid. – foundling Aug 03 '16 at 21:52
  • 1
    Interesting timings hint at using `+` or `+=` in the accepted answer (from 2013) at http://stackoverflow.com/a/12171382/378826 (from Lennart Regebro) even for CPython 2.3+ and to only chose the "append/join" pattern if this clearer exposes the idea for the problem solution at hand. – Dilettant Oct 24 '16 at 05:46
52

Plus operator is perfectly fine solution to concatenate two Python strings. But if you keep adding more than two strings (n > 25) , you might want to think something else.

''.join([a, b, c]) trick is a performance optimization.

Mikko Ohtamaa
  • 82,057
  • 50
  • 264
  • 435
8

The assumption that one should never, ever use + for string concatenation, but instead always use ''.join may be a myth. It is true that using + creates unnecessary temporary copies of immutable string object but the other not oft quoted fact is that calling join in a loop would generally add the overhead of function call. Lets take your example.

Create two lists, one from the linked SO question and another a bigger fabricated

>>> myl1 = ['A','B','C','D','E','F']
>>> myl2=[chr(random.randint(65,90)) for i in range(0,10000)]

Lets create two functions, UseJoin and UsePlus to use the respective join and + functionality.

>>> def UsePlus():
    return [myl[i] + myl[i + 1] for i in range(0,len(myl), 2)]

>>> def UseJoin():
    [''.join((myl[i],myl[i + 1])) for i in range(0,len(myl), 2)]

Lets run timeit with the first list

>>> myl=myl1
>>> t1=timeit.Timer("UsePlus()","from __main__ import UsePlus")
>>> t2=timeit.Timer("UseJoin()","from __main__ import UseJoin")
>>> print "%.2f usec/pass" % (1000000 * t1.timeit(number=100000)/100000)
2.48 usec/pass
>>> print "%.2f usec/pass" % (1000000 * t2.timeit(number=100000)/100000)
2.61 usec/pass
>>> 

They have almost the same runtime.

Lets use cProfile

>>> myl=myl2
>>> cProfile.run("UsePlus()")
         5 function calls in 0.001 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.001    0.001 <pyshell#1376>:1(UsePlus)
        1    0.000    0.000    0.001    0.001 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}


>>> cProfile.run("UseJoin()")
         5005 function calls in 0.029 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.015    0.015    0.029    0.029 <pyshell#1388>:1(UseJoin)
        1    0.000    0.000    0.029    0.029 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     5000    0.014    0.000    0.014    0.000 {method 'join' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {range}

And it looks that using Join, results in unnecessary function calls which could add to the overhead.

Now coming back to the question. Should one discourage the use of + over join in all cases?

I believe no, things should be taken into consideration

  1. Length of the String in Question
  2. No of Concatenation Operation.

And off-course in a development pre-mature optimization is evil.

Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • 8
    Of course, the idea would be not to use `join` inside the loop itself - rather the loop would genertate a sequence that would be passed to join. – jsbueno Apr 06 '12 at 13:49
8

When working with multiple people, it's sometimes difficult to know exactly what's happening. Using a format string instead of concatenation can avoid one particular annoyance that's happened a whole ton of times to us:

Say, a function requires an argument, and you write it expecting to get a string:

In [1]: def foo(zeta):
   ...:     print 'bar: ' + zeta

In [2]: foo('bang')
bar: bang

So, this function may be used pretty often throughout the code. Your coworkers may know exactly what it does, but not necessarily be fully up-to-speed on the internals, and may not know that the function expects a string. And so they may end up with this:

In [3]: foo(23)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

/home/izkata/<ipython console> in foo(zeta)

TypeError: cannot concatenate 'str' and 'int' objects

There would be no problem if you just used a format string:

In [1]: def foo(zeta):
   ...:     print 'bar: %s' % zeta
   ...:     
   ...:     

In [2]: foo('bang')
bar: bang

In [3]: foo(23)
bar: 23

The same is true for all types of objects that define __str__, which may be passed in as well:

In [1]: from datetime import date

In [2]: zeta = date(2012, 4, 15)

In [3]: print 'bar: ' + zeta
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/izkata/<ipython console> in <module>()

TypeError: cannot concatenate 'str' and 'datetime.date' objects

In [4]: print 'bar: %s' % zeta
bar: 2012-04-15

So yes: If you can use a format string do it and take advantage of what Python has to offer.

Izkata
  • 8,961
  • 2
  • 40
  • 50
  • 2
    +1 for a well-reasoned dissenting opinion. I still think I favor `+` though. – Taymon Apr 08 '12 at 15:43
  • 2
    Why wouldn't you just define the foo method as: print 'bar: ' + str(zeta)? – EngineerWithJava54321 Jul 16 '15 at 16:16
  • 2
    @EngineerWithJava54321 For one example, `zeta = u"a\xac\u1234\u20ac\U00008000"` - so you'd have to use `print 'bar: ' + unicode(zeta)` to ensure it doesn't error. `%s` does it right without having to think about it, and is much shorter – Izkata Jul 16 '15 at 16:24
  • @EngineerWithJava54321 Other examples are less relevant here, but for example, `"bar: %s"` might be translated to `"zrb: %s br"` in some other language. The `%s` version will just work, but the string-concat version would become a mess to handle all cases and your translators would now have two separate translations to deal with – Izkata Jul 16 '15 at 16:25
  • If they don't know what foo's implementation is, they will run into this error with any `def`. – insidesin Sep 19 '17 at 08:36
4

According to Python docs, using str.join() will give you performance consistence across various implementations of Python. Although CPython optimizes away the quadratic behavior of s = s + t, other Python implementations may not.

CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations.

Sequence Types in Python docs (see the foot note [6])

Duke
  • 1,332
  • 12
  • 12
3

I have done a quick test:

import sys

str = e = "a xxxxxxxxxx very xxxxxxxxxx long xxxxxxxxxx string xxxxxxxxxx\n"

for i in range(int(sys.argv[1])):
    str = str + e

and timed it:

mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  8000000
8000000 times

real    0m2.165s
user    0m1.620s
sys     0m0.540s
mslade@mickpc:/binks/micks/ruby/tests$ time python /binks/micks/junk/strings.py  16000000
16000000 times

real    0m4.360s
user    0m3.480s
sys     0m0.870s

There is apparently an optimisation for the a = a + b case. It does not exhibit O(n^2) time as one might suspect.

So at least in terms of performance, using + is fine.

Michael Slade
  • 13,802
  • 2
  • 39
  • 44
  • 3
    You could compare to the "join" case here. And there is the matter of other Python implementations, such as pypy, jython, ironpython, etc... – jsbueno Apr 06 '12 at 13:50
3

I use the following with python 3.8

string4 = f'{string1}{string2}{string3}'
Lucas Vazquez
  • 1,456
  • 16
  • 20
1

''.join([a, b]) is better solution than +.

Because Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such)

form a += b or a = a + b is fragile even in CPython and isn't present at all in implementations that don't use refcounting (reference counting is a technique of storing the number of references, pointers, or handles to a resource such as an object, block of memory, disk space or other resource)

https://www.python.org/dev/peps/pep-0008/#programming-recommendations

muhammad ali e
  • 655
  • 6
  • 8
  • 1
    `a += b` works in all implementations of Python, it's just that on some of them it takes quadratic time _when done inside a loop_; the question was about string concatenation _outside_ of a loop. – Taymon May 03 '16 at 05:04