78

I have a stringio object created and it has some text in it. I'd like to clear its existing values and reuse it instead of recalling it. Is there anyway of doing this?

Incognito
  • 1,883
  • 5
  • 21
  • 28

3 Answers3

132

TL;DR

Don't bother clearing it, just create a new one—it’s faster.

The method

Python 2

Here's how I would find such things out:

>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:

truncate(self, size=None) unbound StringIO.StringIO method
    Truncate the file's size.

    If the optional size argument is present, the file is truncated to
    (at most) that size. The size defaults to the current position.
    The current file position is not changed unless the position
    is beyond the new file size.

    If the specified size exceeds the file's current size, the
    file remains unchanged.

So, you want .truncate(0). But it's probably cheaper (and easier) to initialise a new StringIO. See below for benchmarks.

Python 3

(Thanks to tstone2077 for pointing out the difference.)

>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:

truncate(...)
    Truncate size to pos.

    The pos argument defaults to the current file position, as
    returned by tell().  The current file position is unchanged.
    Returns the new absolute position.

It is important to note with this that now the current file position is unchanged, whereas truncating to size zero would reset the position in the Python 2 variant.

Thus, for Python 2, you only need

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'

If you do this in Python 3, you won't get the result you expected:

>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'

So in Python 3 you also need to reset the position:

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'

If using the truncate method in Python 2 code, it's safer to call seek(0) at the same time (before or after, it doesn't matter) so that the code won't break when you inevitably port it to Python 3. And there's another reason why you should just create a new StringIO object!

Times

Python 2

>>> from timeit import timeit
>>> def truncate(sio):
...     sio.truncate(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
... 

When empty, with StringIO:

>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693

With 3KB of data in, with StringIO:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133

And the same with cStringIO:

>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797

So, ignoring potential memory concerns (del oldstringio), it's faster to truncate a StringIO.StringIO (3% faster for empty, 8% faster for 3KB of data), but it's faster ("fasterer" too) to create a new cStringIO.StringIO (8% faster for empty, 10% faster for 3KB of data). So I'd recommend just using the easiest one—so presuming you're working with CPython, use cStringIO and create new ones.

Python 3

The same code, just with seek(0) put in.

>>> def truncate(sio):
...     sio.truncate(0)
...     sio.seek(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

When empty:

>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034

With 3KB of data in:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455

So for Python 3 creating a new one instead of reusing a blank one is 11% faster and creating a new one instead of reusing a 3K one is 5% faster. Again, create a new StringIO rather than truncating and seeking.

Community
  • 1
  • 1
Chris Morgan
  • 86,207
  • 24
  • 208
  • 215
  • why do you need to do a seek(0) first? Seems like truncating it to size=0 would force the current file position to exceed the new file size. – Incognito Dec 02 '10 at 01:26
  • True, I'd been thinking that was truncate from the current position always... updated the answer to remove that; that changes it round for `StringIO.StringIO`, but not for `cStringIO.StringIO` (look at the conclusion) – Chris Morgan Dec 02 '10 at 02:03
  • 2
    I'd upvote this except that the focus on performance is unwarranted, especially when the OP clearly said clear-and-reuse was required. Oh, actually I'll upvote anyway, since upvoting is supposed to mean "this answer was useful", but I still think everything after "so you want truncate(0)" should be deleted. – Peter Hansen Dec 02 '10 at 23:59
  • 4
    @Peter: I think the times are useful, as they demonstrate (with cStringIO at least) what I expected: that you're better to create a new StringIO than truncate the existing one. – Chris Morgan Dec 03 '10 at 00:48
  • my point is exactly that it's not "better" when the OP said he wants to do something other than creating a new instance (at least, that's how I interpreted his request to avoid "recalling" it). – Peter Hansen Dec 06 '10 at 18:23
  • 3
    @PeterHansen I'm coming in really late here, but I was very grateful for the time info. It shows that my assumption (and perhaps the OPs) that truncating would have better performance was unwarranted and that normally we are better off starting new. Sometimes the best answers show that you were asking the wrong question (especially when they also provide the direct answer along with that.) – TimothyAWiseman Sep 29 '12 at 18:28
  • This is a great answer (upvoted), and the benchmark is useful. However, after shuffling around some code with creating new objects, I decided my code was much cleaner with `truncate`. So I guess it depends on one's case. – vonPetrushev Dec 08 '14 at 13:29
  • Great to know that it's faster to create a new object. I can confirm that for my use case this remains true. I'm just wondering if there are any side effects like more memory usage or garbage collection performance issues. – scottlittle Dec 12 '20 at 19:27
  • 1
    Truncating is useful when you are modifying a StringIO by reference, which you are typically doing when you are using a StringIO instead of a string. – Chris Jun 03 '22 at 21:38
16

There is something important to note (at least with Python 3.2):

seek(0) IS needed before truncate(0). Here is some code without the seek(0):

from io import StringIO
s = StringIO()
s.write('1'*3)
print(repr(s.getvalue()))
s.truncate(0)
print(repr(s.getvalue()))
s.write('1'*3)
print(repr(s.getvalue()))

Which outputs:

'111'
''
'\x00\x00\x00111'

with seek(0) before the truncate, we get the expected output:

'111'
''
'111'
tstone2077
  • 514
  • 4
  • 13
  • 1
    Thanks for that; you're absolutely right. I've updated my answer in line with this; the added burden of seeking as well as truncating cements the position of creating a new StringIO as the sensible path still further. – Chris Morgan Jun 20 '13 at 12:42
  • 1
    Unfortunately, I *do* need to reuse the same StringIO instance, as I'm using it in a @patch modifier for sys.stdout in a unit test, and you can't get to an instance variable when patch() is called, because the instance hasn't been created yet. I've added a seek(0) and truncate(0) to my setup() function. Thanks! – Huw Walters Sep 28 '15 at 08:21
3

How I managed to optimise my processing (read in chunks, process each chunk, write processed stream out to file) of many files in a sequence is that I reuse the same cStringIO.StringIO instance, but always reset() it after using, then write to it, and then truncate(). By doing this, I'm only truncating the part at the end that I don't need for the current file. This seems to have given me a ~3% performance increase. Anybody who's more expert on this could confirm if this indeed optimises memory allocation.

sio = cStringIO.StringIO()
for file in files:
    read_file_chunks_and_write_to_sio(file, sio)
    sio.truncate()
    with open('out.bla', 'w') as f:
        f.write(sio.getvalue())
    sio.reset()
Erik Kaplun
  • 37,128
  • 15
  • 99
  • 111
  • Yes, this specialised case may well be better. Could you show benchmarks, using similar code to mine (for both `StringIO` and `cStringIO`)? I'd be interested to see them. – Chris Morgan Mar 23 '12 at 13:12
  • @ChrisMorgan sorry, I saw your comment only now... too late to dig in my memory/source folder now :) – Erik Kaplun Jun 24 '12 at 22:40
  • 1
    Note that in Python>=3.9, the `reset` method is not present: `'_io.StringIO' object has no attribute 'reset'`! – mds Nov 02 '21 at 01:48