430

Since Python's string can't be changed, I was wondering how to concatenate a string more efficiently?

I can write like it:

s += stringfromelsewhere

or like this:

s = []

s.append(somestring)
    
# later
    
s = ''.join(s)

While writing this question, I found a good article talking about the topic.

http://www.skymind.com/~ocrow/python_string/

But it's in Python 2.x., so the question would be did something change in Python 3?

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Max
  • 7,957
  • 10
  • 33
  • 39

12 Answers12

492

The best way of appending a string to a string variable is to use + or +=. This is because it's readable and fast. They are also just as fast, which one you choose is a matter of taste, the latter one is the most common. Here are timings with the timeit module:

a = a + b:
0.11338996887207031
a += b:
0.11040496826171875

However, those who recommend having lists and appending to them and then joining those lists, do so because appending a string to a list is presumably very fast compared to extending a string. And this can be true, in some cases. Here, for example, is one million appends of a one-character string, first to a string, then to a list:

a += b:
0.10780501365661621
a.append(b):
0.1123361587524414

OK, turns out that even when the resulting string is a million characters long, appending was still faster.

Now let's try with appending a thousand character long string a hundred thousand times:

a += b:
0.41823482513427734
a.append(b):
0.010656118392944336

The end string, therefore, ends up being about 100MB long. That was pretty slow, appending to a list was much faster. That that timing doesn't include the final a.join(). So how long would that take?

a.join(a):
0.43739795684814453

Oups. Turns out even in this case, append/join is slower.

So where does this recommendation come from? Python 2?

a += b:
0.165287017822
a.append(b):
0.0132720470428
a.join(a):
0.114929914474

Well, append/join is marginally faster there if you are using extremely long strings (which you usually aren't, what would you have a string that's 100MB in memory?)

But the real clincher is Python 2.3. Where I won't even show you the timings, because it's so slow that it hasn't finished yet. These tests suddenly take minutes. Except for the append/join, which is just as fast as under later Pythons.

Yup. String concatenation was very slow in Python back in the stone age. But on 2.4 it isn't anymore (or at least Python 2.4.7), so the recommendation to use append/join became outdated in 2008, when Python 2.3 stopped being updated, and you should have stopped using it. :-)

(Update: Turns out when I did the testing more carefully that using + and += is faster for two strings on Python 2.3 as well. The recommendation to use ''.join() must be a misunderstanding)

However, this is CPython. Other implementations may have other concerns. And this is just yet another reason why premature optimization is the root of all evil. Don't use a technique that's supposed "faster" unless you first measure it.

Therefore the "best" version to do string concatenation is to use + or +=. And if that turns out to be slow for you, which is pretty unlikely, then do something else.

So why do I use a lot of append/join in my code? Because sometimes it's actually clearer. Especially when whatever you should concatenate together should be separated by spaces or commas or newlines.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • 11
    If you have multiple strings (n > 10) "".join(list_of_strings) is still faster – Mikko Ohtamaa Aug 29 '12 at 05:34
  • http://stackoverflow.com/questions/10043636/any-reason-not-to-use-to-concatenate-two-strings/10043677#10043677 – Mikko Ohtamaa Aug 29 '12 at 05:35
  • 13
    the reason why += is fast is, that there is a performance hack in cpython if the refcount is 1 - it falls apart on pretty much all other python implementations (with the exception of a rather special configured pypy build) –  Aug 29 '12 at 06:45
  • @Ronny: Sure, but you still need to profile to see when the problem arises. My timings as you see above are rather large strings. When you recommend "Use list.append/join" you make people do this when they concatenate just a few short strings, and that's not faster. – Lennart Regebro Aug 29 '12 at 08:55
  • 23
    Why is this being upvoted so much? How is it better to use an algorithm that is only efficient on one specific implementation and has what essentially amounts to a fragile hack to fix a quadratic time algorithm? Also you completely misunderstand the point of "premature optimization is the root of all evil". That quotation is talking about SMALL optimizations. This is going from O(n^2) to O(n) that is NOT a small optimization. – Wes Aug 31 '12 at 02:24
  • 13
    Here is the actual quotation: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" – Wes Aug 31 '12 at 02:28
  • @Wes: Most string concatenations are small optimizations. It's only a big optimization if you are doing a lot of them. How do you know if you are doing a lot of them? It takes a long time. In that case it's useful to know that string concatenations are slow. It is not useful to do "".join([a, b]) instead of a+b everytime you need to concatenate two strings. most cases of you needing to use "".join() is when you have a long list of strings, and in that case "".join() is the simplest way to do it anyway. – Lennart Regebro Aug 31 '12 at 06:26
  • 2
    Nobody is saying that a + b is slow. It's quadratic when you are doing a = a + b more than once. a + b + c is not slow, I repeat _not slow_ since it only has to traverse each string once, whereas it has to re-traverse the previous strings many times with the a = a + b approach (assuming that is in a loop of some kind). Remember strings are immutable. – Wes Aug 31 '12 at 11:15
  • @Wes: And there you have already gone from "Which is the best way to concatenate a string" to "Which is the best way to concatenate a list of strings in Python. Two different questions. This is the first one. For the second one, in which you have a list of strings and you want to concatenate them, the answer is ''.join(thelist), regardless of speed, because that's the simplest and clearest way of doing it. But that is, as mentioned **a completely different question**. – Lennart Regebro Aug 31 '12 at 11:44
  • @Wes: I made a talk where I spend around 5 minutes dissecting this. `+` and `+=` is faster or as fast in all cases when joining two strings and all platforms, except possibly in some extreme cases I didn't find. http://youtu.be/50OIO9ONmks?t=18m09s Slides: http://slides.colliberty.com/DjangoConEU-2013/#/step-40 – Lennart Regebro Aug 23 '13 at 06:37
  • @Lennart, I would not rely on benchmarks in this case because whether += is optimized to not run in quadratic time depends on your particular implementation of Python. Even if most popular implementations do it, it's technically not guaranteed. – Wes Aug 23 '13 at 17:36
  • @Wes: I tried multiple implementations and versions. It makes no difference. `a += b` is as fast or faster than `''.join([a, b])`, and `''.join(alonglist)` is faster than looping over alonglist and concatenating one by one. – Lennart Regebro Aug 23 '13 at 19:42
  • 1
    It isn't a part of the Python language that += must be optimized like that. – Wes Aug 23 '13 at 19:44
  • @Wes I repeat: No optimization needed. – Lennart Regebro Aug 23 '13 at 19:45
  • 1
    You aren't really addressing my point, which is that strings are immutable in the python language, and the fact that += is optimized to append strings in O(1) time is an implementation detail. – Wes Aug 23 '13 at 19:47
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36131/discussion-between-lennart-regebro-and-wes) – Lennart Regebro Aug 23 '13 at 19:50
  • See https://stackoverflow.com/questions/34008010/is-this-time-complexity-actually-on2. Following the answers given in this thread, it is best to not concatenate strings. – user877329 Jul 01 '17 at 14:28
58

If you are concatenating a lot of values, then neither. Appending a list is expensive. You can use StringIO for that. Especially if you are building it up over a lot of operations.

from cStringIO import StringIO
# python3:  from io import StringIO

buf = StringIO()

buf.write('foo')
buf.write('foo')
buf.write('foo')

buf.getvalue()
# 'foofoofoo'

If you already have a complete list returned to you from some other operation, then just use the ''.join(aList)

From the python FAQ: What is the most efficient way to concatenate many strings together?

str and bytes objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length.

To accumulate many str objects, the recommended idiom is to place them into a list and call str.join() at the end:

chunks = []
for s in my_strings:
    chunks.append(s)
result = ''.join(chunks)

(another reasonably efficient idiom is to use io.StringIO)

To accumulate many bytes objects, the recommended idiom is to extend a bytearray object using in-place concatenation (the += operator):

result = bytearray()
for b in my_bytes_objects:
    result += b

Edit: I was silly and had the results pasted backwards, making it look like appending to a list was faster than cStringIO. I have also added tests for bytearray/str concat, as well as a second round of tests using a larger list with larger strings. (python 2.7.3)

ipython test example for large lists of strings

try:
    from cStringIO import StringIO
except:
    from io import StringIO

source = ['foo']*1000

%%timeit buf = StringIO()
for i in source:
    buf.write(i)
final = buf.getvalue()
# 1000 loops, best of 3: 1.27 ms per loop

%%timeit out = []
for i in source:
    out.append(i)
final = ''.join(out)
# 1000 loops, best of 3: 9.89 ms per loop

%%timeit out = bytearray()
for i in source:
    out += i
# 10000 loops, best of 3: 98.5 µs per loop

%%timeit out = ""
for i in source:
    out += i
# 10000 loops, best of 3: 161 µs per loop

## Repeat the tests with a larger list, containing
## strings that are bigger than the small string caching 
## done by the Python
source = ['foo']*1000

# cStringIO
# 10 loops, best of 3: 19.2 ms per loop

# list append and join
# 100 loops, best of 3: 144 ms per loop

# bytearray() +=
# 100 loops, best of 3: 3.8 ms per loop

# str() +=
# 100 loops, best of 3: 5.11 ms per loop
jdi
  • 90,542
  • 19
  • 167
  • 203
  • 3
    `cStringIO` doesn't exist in Py3. Use `io.StringIO` instead. – lvc Aug 29 '12 at 01:52
  • 4
    As for why appending to a string repeatedly can be expensive: http://www.joelonsoftware.com/articles/fog0000000319.html – Wes Aug 29 '12 at 01:58
  • wait what? when you said "appending a list [is expensive]", you meant "appending a string" right? – khuongduybui Nov 03 '21 at 23:44
  • @khuongduybui it probably should say "appending TO a list is expensive" – jdi Nov 05 '21 at 03:09
  • The example that measure `.join()` is actually measuring `.append()`. To measure `.join()`, we should say `%timeit "".join(source)`? – lpozo Apr 03 '23 at 07:06
  • @lpozo well it's a good point but there are probably two different things that should be tested. The original intent was to test building the list and then joining it. You could add another test assuming you already had a list and only time the join – jdi Apr 04 '23 at 08:59
53

In Python >= 3.6, the new f-string is an efficient way to concatenate a string.

>>> name = 'some_name'
>>> number = 123
>>>
>>> f'Name is {name} and the number is {number}.'
'Name is some_name and the number is 123.'
mirh
  • 514
  • 8
  • 14
SuperNova
  • 25,512
  • 7
  • 93
  • 64
  • 5
    If `f'{a}{b}'` isn't _more efficient_ than `a += b` or `a + b`, I don't see how this is meaningfully responsive to a question that asks specifically about performance. This feature is syntax sugar (good and useful sugar, to be sure!), not a performance optimization. – Charles Duffy Nov 18 '20 at 23:10
23

Using in place string concatenation by '+' is THE WORST method of concatenation in terms of stability and cross implementation as it does not support all values. PEP8 standard discourages this and encourages the use of format(), join() and append() for long term use.

As quoted from the linked "Programming Recommendations" section:

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

mdmay74
  • 70
  • 1
  • 7
badslacks
  • 263
  • 2
  • 3
  • 7
    Reference link would have been nice :) –  Feb 01 '18 at 09:19
  • 5
    What a ridiculous situation. It's one of the first things people are taught how to do, and here we have the wizards in the ivory tower issuing a PEP discouraging it because it's fragile. – Magnus Lind Oxlund Jul 24 '21 at 11:48
  • People are taught to do it, and it does work, and for introductory programming it won't many a difference. But people who are concerned about efficiency need to look a bit deeper. I don't think it's "ridiculous" that the language is designed this way -- how else would you design it? – AnotherParker Jul 11 '22 at 02:24
10

You write this function

def str_join(*args):
    return ''.join(map(str, args))

Then you can call simply wherever you want

str_join('Pine')  # Returns : Pine
str_join('Pine', 'apple')  # Returns : Pineapple
str_join('Pine', 'apple', 3)  # Returns : Pineapple3
Shameem
  • 2,664
  • 17
  • 21
10

You can do in different ways.

str1 = "Hello"
str2 = "World"
str_list = ['Hello', 'World']
str_dict = {'str1': 'Hello', 'str2': 'World'}

# Concatenating With the + Operator
print(str1 + ' ' + str2)  # Hello World

# String Formatting with the % Operator
print("%s %s" % (str1, str2))  # Hello World

# String Formatting with the { } Operators with str.format()
print("{}{}".format(str1, str2))  # Hello World
print("{0}{1}".format(str1, str2))  # Hello World
print("{str1} {str2}".format(str1=str_dict['str1'], str2=str_dict['str2']))  # Hello World
print("{str1} {str2}".format(**str_dict))  # Hello World

# Going From a List to a String in Python With .join()
print(' '.join(str_list))  # Hello World

# Python f'strings --> 3.6 onwards
print(f"{str1} {str2}")  # Hello World

I created this little summary through following articles.

Kushan Gunasekera
  • 7,268
  • 6
  • 44
  • 58
8

The recommended method is still to use append and join.

MRAB
  • 20,356
  • 6
  • 40
  • 33
  • 1
    As you see from my answer, this depends on how many strings you are concatenating. I've done some timings on this (see the talk I linked to in my comments on my answer) and generally unless it's more than ten, use +. – Lennart Regebro Aug 23 '13 at 06:39
  • 1
    PEP8 mentions this (https://www.python.org/dev/peps/pep-0008/#programming-recommendations). The rational is that while CPython has special optimizations for string concatenation with +=, other implementations may not. – Quantum7 Sep 07 '17 at 12:04
8

If the strings you are concatenating are literals, use String literal concatenation

re.compile(
        "[A-Za-z_]"       # letter or underscore
        "[A-Za-z0-9_]*"   # letter, digit or underscore
    )

This is useful if you want to comment on part of a string (as above) or if you want to use raw strings or triple quotes for part of a literal but not all.

Since this happens at the syntax layer it uses zero concatenation operators.

Community
  • 1
  • 1
droid
  • 291
  • 2
  • 5
8

As @jdi mentions Python documentation suggests to use str.join or io.StringIO for string concatenation. And says that a developer should expect quadratic time from += in a loop, even though there's an optimisation since Python 2.4. As this answer says:

If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

I will show an example of real-world code that naively relied on += this optimisation, but it didn't apply. The code below converts an iterable of short strings into bigger chunks to be used in a bulk API.

def test_concat_chunk(seq, split_by):
    result = ['']
    for item in seq:
        if len(result[-1]) + len(item) > split_by: 
            result.append('')
        result[-1] += item
    return result

This code can literary run for hours because of quadratic time complexity. Below are alternatives with suggested data structures:

import io

def test_stringio_chunk(seq, split_by):
    def chunk():
        buf = io.StringIO()
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                size += buf.write(item)
            else:
                yield buf.getvalue()
                buf = io.StringIO()
                size = buf.write(item)
        if size:
            yield buf.getvalue()

    return list(chunk())

def test_join_chunk(seq, split_by):
    def chunk():
        buf = []
        size = 0
        for item in seq:
            if size + len(item) <= split_by:
                buf.append(item)
                size += len(item)
            else:
                yield ''.join(buf)                
                buf.clear()
                buf.append(item)
                size = len(item)
        if size:
            yield ''.join(buf)

    return list(chunk())

And a micro-benchmark:

import timeit
import random
import string
import matplotlib.pyplot as plt

line = ''.join(random.choices(
    string.ascii_uppercase + string.digits, k=512)) + '\n'
x = []
y_concat = []
y_stringio = []
y_join = []
n = 5
for i in range(1, 11):
    x.append(i)
    seq = [line] * (20 * 2 ** 20 // len(line))
    chunk_size = i * 2 ** 20
    y_concat.append(
        timeit.timeit(lambda: test_concat_chunk(seq, chunk_size), number=n) / n)
    y_stringio.append(
        timeit.timeit(lambda: test_stringio_chunk(seq, chunk_size), number=n) / n)
    y_join.append(
        timeit.timeit(lambda: test_join_chunk(seq, chunk_size), number=n) / n)
plt.plot(x, y_concat)
plt.plot(x, y_stringio)
plt.plot(x, y_join)
plt.legend(['concat', 'stringio', 'join'], loc='upper left')
plt.show()

micro-benchmark

saaj
  • 23,253
  • 3
  • 104
  • 105
6

While somewhat dated, Code Like a Pythonista: Idiomatic Python recommends join() over + in this section. As does PythonSpeedPerformanceTips in its section on string concatenation, with the following disclaimer:

The accuracy of this section is disputed with respect to later versions of Python. In CPython 2.5, string concatenation is fairly fast, although this may not apply likewise to other Python implementations. See ConcatenationTestCode for a discussion.

Levon
  • 138,105
  • 33
  • 200
  • 191
4

my use case was slight different. I had to construct a query where more then 20 fields were dynamic. I followed this approach of using format method

query = "insert into {0}({1},{2},{3}) values({4}, {5}, {6})"
query.format('users','name','age','dna','suzan',1010,'nda')

this was comparatively simpler for me instead of using + or other ways

Ishwar Rimal
  • 1,071
  • 11
  • 19
3

You can use this(more efficient) too. (https://softwareengineering.stackexchange.com/questions/304445/why-is-s-better-than-for-concatenation)

s += "%s" %(stringfromelsewhere)
Community
  • 1
  • 1
SuperNova
  • 25,512
  • 7
  • 93
  • 64