178

Is there some string class in Python like StringBuilder in C#?

Smi
  • 13,850
  • 9
  • 56
  • 64
icn
  • 17,126
  • 39
  • 105
  • 141
  • 9
    This is a duplicate of [Python equivalent of Java StringBuffer](https://stackoverflow.com/questions/19926089/python-equivalent-of-java-stringbuffer). **CAUTION: The answers here are way out of date and have, in fact, become misleading.** See [that other question](https://stackoverflow.com/questions/19926089/python-equivalent-of-java-stringbuffer) for answers that are more relevant to modern Python versions (certainly 2.7 and above). – Jean-François Corbett Nov 20 '17 at 08:52

8 Answers8

139

There is no one-to-one correlation. For a really good article please see Efficient String Concatenation in Python:

Building long strings in the Python progamming language can sometimes result in very slow running code. In this article I investigate the computational performance of various string concatenation methods.

TLDR the fastest method is below. It's extremely compact, and also pretty understandable:

def method6():
  return ''.join([`num` for num in xrange(loop_count)])
Sameer Alibhai
  • 3,092
  • 4
  • 36
  • 36
Andrew Hare
  • 344,730
  • 71
  • 640
  • 635
  • 35
    Note that this article was written based on Python 2.2. The tests would likely come out somewhat differently in a modern version of Python (CPython usually successfully optimizes concatenation, but you don't want to depend on this in important code) and a generator expression where he uses a list comprehension would be worthy of consideration. – Mike Graham Mar 10 '10 at 06:35
  • 6
    It would be good to pull in some highlights in that article, at the least a couple of the implementations (to avoid link rot problems). – jpmc26 Jul 29 '14 at 22:22
  • 4
    Method 1: resultString += appendString is the fastest according to tests by @Antoine-tran below – Justas Dec 31 '15 at 17:47
  • 8
    Your quote doesn't at all answer the question. Please include the relevant parts in your answer itself, to comply with new guidelines. – Nic Oct 21 '16 at 16:48
  • As I pointed out in comments to another answer, this does not simulate how people use StringBuilder in C#. StringBuilder is not just a string concatenation method. It also provides a way for you to store string values in a long indeterminative process. – Feng Jiang May 25 '23 at 13:57
48

Relying on compiler optimizations is fragile. The benchmarks linked in the accepted answer and numbers given by Antoine-tran are not to be trusted. Andrew Hare makes the mistake of including a call to repr in his methods. That slows all the methods equally but obscures the real penalty in constructing the string.

Use join. It's very fast and more robust.

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop
GrantJ
  • 8,162
  • 3
  • 52
  • 46
  • 2
    Yes, the `repr` call dominates the runtime, but there's no need to make the mistake personal. – Alex Reinking Aug 17 '18 at 21:43
  • 12
    @AlexReinking sorry, nothing personal meant. I'm not sure what made you think it was personal. But if it was the use of their names, I used those only to refer to the user's answers (matches usernames, not sure if there's a better way). – GrantJ Aug 18 '18 at 19:15
  • 1
    good timing example that separates data initialization and concatenation operations – aiodintsov Jun 29 '19 at 22:37
  • This answer is misleading because it creates the whole list of values together, so that it can be joined in one step. So it doesn't have the for loop. In reality, before join, you need to append the elements one by one, just like the for loop with other methods. – Feng Jiang May 23 '23 at 03:30
  • @FengJiang that is incorrect. The “for loop” is inside the join method implemented in C-code with optimizations. That is what makes it so fast. Moving the list initialization above the benchmarking makes the three measurements more accurate. – GrantJ May 24 '23 at 05:13
  • But the point is when we use StringBuilder to build long strings, it is because we can't create the whole list of values in advance. Otherwise we don't need to use a StringBuilder in the first place. The comparison is only meaningful when you add values to the list one by one in the for loop. – Feng Jiang May 25 '23 at 13:51
  • I don’t agree with that reason to use a StringBuilder. But in any case, the string’s join method will consume any iterable. Lazily generated values will work the same. – GrantJ May 26 '23 at 14:57
30

I have used the code of Oliver Crow (link given by Andrew Hare) and adapted it a bit to tailor Python 2.7.3. (by using timeit package). I ran on my personal computer, Lenovo T61, 6GB RAM, Debian GNU/Linux 6.0.6 (squeeze).

Here is the result for 10,000 iterations:

method1:  0.0538418292999 secs
process size 4800 kb
method2:  0.22602891922 secs
process size 4960 kb
method3:  0.0605459213257 secs
process size 4980 kb
method4:  0.0544030666351 secs
process size 5536 kb
method5:  0.0551080703735 secs
process size 5272 kb
method6:  0.0542731285095 secs
process size 5512 kb

and for 5,000,000 iterations (method 2 was ignored because it ran tooo slowly, like forever):

method1:  5.88603997231 secs
process size 37976 kb
method3:  8.40748500824 secs
process size 38024 kb
method4:  7.96380496025 secs
process size 321968 kb
method5:  8.03666186333 secs
process size 71720 kb
method6:  6.68192911148 secs
process size 38240 kb

It is quite obvious that Python guys have done pretty great job to optimize string concatenation, and as Hoare said: "premature optimization is the root of all evil" :-)

Antoine-tran
  • 333
  • 3
  • 3
  • 3
    Apparently Hoare does not accept that: http://hans.gerwitz.com/2004/08/12/premature-optimization-is-the-root-of-all-evil.html – Pimin Konstantin Kefaloukos Dec 11 '12 at 13:13
  • 6
    It is not a premature optimization to avoid fragile, interpreter-dependant optimizations. If you ever want to port to PyPy or risk hitting [one of the many subtle failure cases](http://stackoverflow.com/questions/24040198/cpython-string-addition-optimisation-failure-case) for the optimization, do things the right way. – Veedrac Nov 03 '14 at 21:46
  • 1
    Looks like Method 1 is easier for the compiler to optimize. – mbomb007 Apr 29 '15 at 18:21
24

Python has several things that fulfill similar purposes:

  • One common way to build large strings from pieces is to grow a list of strings and join it when you are done. This is a frequently-used Python idiom.
    • To build strings incorporating data with formatting, you would do the formatting separately.
  • For insertion and deletion at a character level, you would keep a list of length-one strings. (To make this from a string, you'd call list(your_string). You could also use a UserString.MutableString for this.
  • (c)StringIO.StringIO is useful for things that would otherwise take a file, but less so for general string building.
Mike Graham
  • 73,987
  • 14
  • 101
  • 130
17

Using method 5 from above (The Pseudo File) we can get very good perf and flexibility

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

now using it

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb
Thomas Watson
  • 595
  • 4
  • 8
6

you can try StringIO or cStringIO

Dominic K
  • 6,975
  • 11
  • 53
  • 62
ghostdog74
  • 327,991
  • 56
  • 259
  • 343
0

There is no explicit analogue - i think you are expected to use string concatenations(likely optimized as said before) or third-party class(i doubt that they are a lot more efficient - lists in python are dynamic-typed so no fast-working char[] for buffer as i assume). Stringbuilder-like classes are not premature optimization because of innate feature of strings in many languages(immutability) - that allows many optimizations(for example, referencing same buffer for slices/substrings). Stringbuilder/stringbuffer/stringstream-like classes work a lot faster than concatenating strings(producing many small temporary objects that still need allocations and garbage collection) and even string formatting printf-like tools, not needing of interpreting formatting pattern overhead that is pretty consuming for a lot of format calls.

Mastermind
  • 51
  • 6
-6

In case you are here looking for a fast string concatenation method in Python, then you do not need a special StringBuilder class. Simple concatenation works just as well without the performance penalty seen in C#.

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

See Antoine-tran's answer for performance results

Community
  • 1
  • 1
Justas
  • 5,718
  • 2
  • 34
  • 36