743

How do I efficiently append one string to another? Are there any faster alternatives to:

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

For handling multiple strings in a list, see How to concatenate (join) items in a list to a single string.

See How do I put a variable’s value inside a string (interpolate it into the string)? if some inputs are not strings, but the result should still be a string.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
user469652
  • 48,855
  • 59
  • 128
  • 165
  • 17
    **TL;DR:** If you're just looking for the simple way to append strings, and you don't care about efficiency: `"foo" + "bar" + str(3)` – Andrew Jun 13 '18 at 15:29

13 Answers13

759

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

The end result is that the operation is amortized O(n).

e.g.

s = ""
for i in range(n):
    s += str(i)

used to be O(n^2), but now it is O(n).

More information

From the source (bytesobject.c):

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing \0 byte (newsize
   does *not* include that), and a trailing \0 byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '\0';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It's easy enough to verify empirically.

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec. It's only in the cPython implementation as far as I know. The same empirical testing on pypy or jython for example might show the older O(n**2) performance.

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic. So pypy is doing something that works well with short strings, but performs poorly for larger strings.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • 17
    Interesting. By "now", do you mean Python 3.x? – Steve Tjoa Dec 14 '10 at 04:14
  • 13
    @Steve, No. It's at least in 2.6 maybe even 2.5 – John La Rooy Dec 14 '10 at 08:35
  • 8
    You've quoted the `PyString_ConcatAndDel` function but included the comment for `_PyString_Resize`. Also, the comment doesn't really establish your claim regarding the Big-O – Winston Ewert Mar 31 '12 at 00:10
  • 2
    @JohnLaRooy You might have stopped your CPython experiment one iteration too early. I do roughly get time factor 10 up to 1000000, but then from 1000000 to 10000000 it suddenly takes **100** times as long. Maybe it only optimizes only up a certain size? I'm running Python 2.7.11 on Windows 10 64 bit. – Stefan Pochmann Aug 19 '16 at 18:43
  • 5
    congratulations on exploiting a CPython feature that will make the code crawl on other implementations. Bad advice. – Jean-François Fabre Jan 20 '19 at 20:19
  • 20
    Do NOT use this. Pep8 states explicitely: [Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such](https://www.python.org/dev/peps/pep-0008/#programming-recommendations), it then give this specific example as something to avoid since it's so fragile. Better use `"".join(str_a, str_b)` – Er... Oct 23 '19 at 15:42
  • 1
    What does "amortized O(n)" mean? I can't find it on Google! – Mattia Rasulo Nov 15 '20 at 09:25
  • 1
    @MattiaRasulo https://en.wikipedia.org/wiki/Amortized_analysis – xuiqzy Apr 16 '21 at 11:33
  • It should be noted: even if it's amortized O(n), it might still be slower than `"".join`. – Mateen Ulhaq Mar 29 '22 at 12:34
  • 1
    @Eraw Correction: `"".join([str_a, str_b])` – wjandrea Jan 11 '23 at 20:22
  • @Eraw so do you recommend `str_a = "".join((str_a, str_b, str_c))` ? I was trying to fix a performance issue in an application and my benchmark shows that this form is massively slower than `stra += str_b + str_c` (45 times slower). – Étienne Feb 01 '23 at 14:53
  • @Étienne see https://stackoverflow.com/questions/1316887/what-is-the-most-efficient-string-concatenation-method-in-python – Er... Feb 02 '23 at 10:48
361

Don't prematurely optimize. If you have no reason to believe there's a speed bottleneck caused by string concatenations then just stick with + and +=:

s  = 'foo'
s += 'bar'
s += 'baz'

That said, if you're aiming for something like Java's StringBuilder, the canonical Python idiom is to add items to a list and then use str.join to concatenate them all at the end:

l = []
l.append('foo')
l.append('bar')
l.append('baz')

s = ''.join(l)
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • I don't know what the speed implications of building your strings as lists and then .join()ing them are, but I find it's generally the cleanest way. I've also had great successes with using %s notation within a string for a SQL templating engine I wrote. – richo Dec 14 '10 at 02:10
  • 40
    @Richo Using .join is more efficient. The reason is that Python strings are immutable, so repeatedly using s += more will allocate lots of successively larger strings. .join will generate the final string in one go from its constituent parts. – Ben Dec 14 '10 at 03:35
  • 7
    @Ben, there has been a significant improvement in this area - see my answer – John La Rooy Dec 14 '10 at 04:06
62
str1 = "Hello"
str2 = "World"
newstr = " ".join((str1, str2))

That joins str1 and str2 with a space as separators. You can also do "".join(str1, str2, ...). str.join() takes an iterable, so you'd have to put the strings in a list or a tuple.

That's about as efficient as it gets for a builtin method.

Rafe Kettler
  • 75,757
  • 21
  • 156
  • 151
39

Don't.

That is, for most cases you are better off generating the whole string in one go rather then appending to an existing string.

For example, don't do: obj1.name + ":" + str(obj1.count)

Instead: use "%s:%d" % (obj1.name, obj1.count)

That will be easier to read and more efficient.

Winston Ewert
  • 44,070
  • 10
  • 68
  • 83
  • 72
    i'm sorry there is nothing more easier to read than ( string + string ) like the first example, the second example might be more efficient, but not more readable – JqueryToAddNumbers Feb 27 '15 at 23:08
  • 26
    @ExceptionSlayer, string + string is pretty easy to follow. But `"
    " + message_text + "
    "`, I find less readable and error-prone then `"
    {message_text}
    ".format(classname=class_name, message_text=message_text, id=generateUniqueId())`
    – Winston Ewert Mar 02 '15 at 15:18
  • 1
    This doesn't help at all when what I'm trying to do is the rough equivalent of, say, PHP/perl's "string .= verifydata()" or similar. – Shadur Feb 23 '16 at 08:42
  • @Shadur, my point is that you should think again, do you really want to do something equivalent, or is an entirely different approach better? – Winston Ewert Feb 23 '16 at 14:36
  • 2
    And in this case the answer to that question is "No, because that approach doesn't cover my use case" – Shadur Feb 23 '16 at 14:37
  • @Shadur, perhaps not. But I'd have to see a more extensive example to know. – Winston Ewert Feb 23 '16 at 14:38
  • 3
    With Python 3.6 we have `f"
    {message_text}
    "`
    – Trenton Nov 13 '18 at 07:08
  • @Trenton Thanks! This is the most readable and should be preferred, if the expressions to be inserted are already readable in itself! No need to define new variables and have to look at the end ( `.format` ) of a long expression to see what is filled in. – xuiqzy Apr 16 '21 at 11:37
32

Python 3.6 gives us f-strings, which are a delight:

var1 = "foo"
var2 = "bar"
var3 = f"{var1}{var2}"
print(var3)                       # prints foobar

You can do most anything inside the curly braces

print(f"1 + 1 == {1 + 1}")        # prints 1 + 1 == 2
Trenton
  • 11,678
  • 10
  • 56
  • 60
16

If you need to do many append operations to build a large string, you can use StringIO or cStringIO. The interface is like a file. ie: you write to append text to it.

If you're just appending two strings then just use +.

jkmartindale
  • 523
  • 2
  • 9
  • 22
Laurence Gonsalves
  • 137,896
  • 35
  • 246
  • 299
10

it really depends on your application. If you're looping through hundreds of words and want to append them all into a list, .join() is better. But if you're putting together a long sentence, you're better off using +=.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
Ramy
  • 20,541
  • 41
  • 103
  • 153
7

Basically, no difference. The only consistent trend is that Python seems to be getting slower with every version... :(


List

%%timeit
x = []
for i in range(100000000):  # xrange on Python 2.7
    x.append('a')
x = ''.join(x)

Python 2.7

1 loop, best of 3: 7.34 s per loop

Python 3.4

1 loop, best of 3: 7.99 s per loop

Python 3.5

1 loop, best of 3: 8.48 s per loop

Python 3.6

1 loop, best of 3: 9.93 s per loop


String

%%timeit
x = ''
for i in range(100000000):  # xrange on Python 2.7
    x += 'a'

Python 2.7:

1 loop, best of 3: 7.41 s per loop

Python 3.4

1 loop, best of 3: 9.08 s per loop

Python 3.5

1 loop, best of 3: 8.82 s per loop

Python 3.6

1 loop, best of 3: 9.24 s per loop

ostrokach
  • 17,993
  • 11
  • 78
  • 90
5

Append strings with the add function:

str1 = "Hello"
str2 = " World"
str3 = str1.__add__(str2)
print(str3)

Output:

Hello World
mathematics-and-caffeine
  • 1,664
  • 2
  • 15
  • 19
saigopi.me
  • 14,011
  • 2
  • 83
  • 54
3
a='foo'
b='baaz'

a.__add__(b)

out: 'foobaaz'
Rahul Shrivastava
  • 1,391
  • 3
  • 14
  • 38
  • 2
    Code is nice, but it would help to have an accompanying explanation. Why use this method rather than the other answers on this page? – cgmb Nov 20 '15 at 18:42
  • 17
    Using `a.__add__(b)` is identical to writing `a+b`. When you concatenate strings using the `+` operator, Python will call the `__add__` method on the string on the left side passing the right side string as a parameter. – Addie Dec 05 '15 at 20:10
0

One other option is to use .format as following:

print("{}{}".format(var1, var2))
Baris Ozensel
  • 433
  • 1
  • 3
  • 11
0

Depends on what you are trying to do. If you are formatting a variable into a string to print, e.g. you want the output to be:

Hello, Bob

Given the name Bob, you'd want to us %s. print("Hello, %s" % my_variable) It's efficient, and it works with all data-types (so you don't have to do str(my_variable) like you do with "a" + str(5)).

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Hchap
  • 58
  • 6
0

You can use this to join strings: f"{var1} {var2}"

joshlsullivan
  • 1,375
  • 2
  • 14
  • 21