60

I have some legacy code with a legacy function that takes a filename as an argument and processes the file contents. A working facsimile of the code is below.

What I want to do is not have to write to disk with some content that I generate in order to use this legacy function, so I though I could use StringIO to create an object in place of the physical filename. However, this does not work, as you can see below.

I thought StringIO was the way to go with this. Can anyone tell me if there is a way to use this legacy function and pass it something in the argument that isn't a file on disk but can be treated as such by the legacy function? The legacy function does have the with context manager doing work on the filename parameter value.

The one thing I came across in google was: http://bugs.python.org/issue1286, but that didn't help me...

Code

from pprint import pprint
import StringIO

    # Legacy Function
def processFile(filename):
    with open(filename, 'r') as fh:
        return fh.readlines()

    # This works
print 'This is the output of FileOnDisk.txt'
pprint(processFile('c:/temp/FileOnDisk.txt'))
print

    # This fails
plink_data = StringIO.StringIO('StringIO data.')
print 'This is the error.'
pprint(processFile(plink_data))

Output

This is the output in FileOnDisk.txt:

['This file is on disk.\n']

This is the error:

Traceback (most recent call last):
  File "C:\temp\test.py", line 20, in <module>
    pprint(processFile(plink_data))
  File "C:\temp\test.py", line 6, in processFile
    with open(filename, 'r') as fh:
TypeError: coercing to Unicode: need string or buffer, instance found
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
mpettis
  • 3,222
  • 4
  • 28
  • 35

4 Answers4

83

A StringIO instance is an open file already. The open command, on the other hand, only takes filenames, to return an open file. A StringIO instance is not suitable as a filename.

Also, you don't need to close a StringIO instance, so there is no need to use it as a context manager either. While closing an instance frees the memory allocated, so does simply letting the garbage collector reap the object. At any rate, the contextlib.closing() context manager could take care of closing the object if you want to ensure freeing the memory while still holding a reference to the object.

If all your legacy code can take is a filename, then a StringIO instance is not the way to go. Use the tempfile module to generate a temporary filename instead.

Here is an example using a contextmanager to ensure the temp file is cleaned up afterwards:

import os
import tempfile
from contextlib import contextmanager

@contextmanager
def tempinput(data):
    temp = tempfile.NamedTemporaryFile(delete=False)
    temp.write(data)
    temp.close()
    try:
        yield temp.name
    finally:
        os.unlink(temp.name)

with tempinput('Some data.\nSome more data.') as tempfilename:
    processFile(tempfilename)

You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    @mike: Because of the `delete=False` argument when it was created, the named temporary file will _not_ be deleted as soon as it is closed — read the [docs](https://docs.python.org/2/library/tempfile.html#tempfile.NamedTemporaryFile). Seems like that would have been fairly obvious from the `temp.close()` just before the `yield temp.name` statement... – martineau Jan 06 '16 at 10:00
  • `you don't need to close a StringIO instance` ; But then why there is a `close()` method provided for StringIO? Have a look at this question here https://stackoverflow.com/q/9718950/10204932 . Great explanation btw. – Deepam Gupta Aug 17 '21 at 11:04
  • 1
    @Genius: It's more that just letting the object being garbage collected achieves the exact same effect. But yes, calling `.close()` will clear the memory buffer allocated for the in-memory file data. – Martijn Pieters Aug 17 '21 at 11:15
  • @MartjinPieters , how do we stand to this problem now in python3 where io.StringIO has both exit and readlines methods ?? I am facing OP refactoring code priblem – pippo1980 Dec 16 '22 at 10:43
  • @pippo1980: I have no idea what you are asking here. If you have the same issue, and have code that expects a filename, just use my `tempinput()` context manager to provide a filename pointing to a temporary file with given data. – Martijn Pieters Dec 28 '22 at 13:45
  • The question is useful for python 2.x but the title doesnt communicate that, could be misleading to newcomers like me. The edit queque is full cant add : " in Python 2 x" to title – pippo1980 Dec 28 '22 at 16:34
  • @pippo1980 look at the *tags*. It’s very clearly markers with the `python-2.x` tag. – Martijn Pieters Dec 28 '22 at 20:56
  • yep I realized that already. Thats why I referred to the title. Googling I get title not the tag. I'll pay more attentions to tag from now. – pippo1980 Dec 28 '22 at 22:05
6

you could define your own open function

fopen = open
def open(fname,mode):
    if hasattr(fname,"readlines"): return fname
    else: return fopen(fname,mode)

however with wants to call __exit__ after its done and StringIO does not have an exit method...

you could define a custom class to use with this open

class MyStringIO:
     def __init__(self,txt):
         self.text = txt
     def readlines(self):
          return self.text.splitlines()
     def __exit__(self):
          pass
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Unfortunately that does not solve the problem since it would have to be inside of the legacy function – jdi Aug 09 '12 at 22:10
  • wouldnt this open override it as long as it was in the same file? – Joran Beasley Aug 09 '12 at 22:13
  • @jdi I think it might work if it was defined before the legacy function, i.e. when the legacy module is imported. – Mark Ransom Aug 09 '12 at 22:14
  • however ... unfortunately stringIO does not have an __exit__ method so it will break It would need to be combined with a custom class that had an exit method – Joran Beasley Aug 09 '12 at 22:15
  • 1
    Actually the only way to make the legacy module pick up the custom open is to define the new `open` first, then import the legacy module, and do: `legacy.open = open`. Because the legacy module is using its own scope. – jdi Aug 09 '12 at 22:20
  • or just say `__builtin__.open = open` ... but this solution is total hackery ... but the only way I can think of to accomplish what OP wants... – Joran Beasley Aug 09 '12 at 22:21
  • 1
    I started to make another answer but quickly realized it was only half the problem, which your second example covers. You could suggest using [tempfile.SpooledTenporaryFile](http://docs.python.org/library/tempfile.html#tempfile.SpooledTemporaryFile) with a `max_size=10e8` or something high. This will be a file-like object, using StringIO under the hood, and already has a context manager. – jdi Aug 09 '12 at 22:35
  • Thanks all -- to jdi's last comment, as far as I could tell, SpooledTemporaryFile had the same problem as StringIO, in that it was a file-like object, but my legacy function required a string that was a path to a file. I ended up using Martijn Pieters solution below, which works. I really wanted to find a solution where I passed a string/object to the legacy function that could be used in the open function but wasn't really a file on disk, but a file in memory. – mpettis Aug 10 '12 at 15:40
  • @JoranBeasley, how do we stand to this problem now in python3 where io.StringIO has both __exit__ and readlines methods ?? I am facing OP refactoring code priblem – pippo1980 Dec 16 '22 at 10:37
2

This one is based on the python doc of contextmanager

It's just wrapping StringIO with simple context, and when exit is called, it will return to the yield point, and properly close the StringIO. This avoids the need of making tempfile, but with large string, this will still eat up the memory, since StringIO buffer that string. It works well on most cases where you know the string data is not going to be long

from contextlib import contextmanager

@contextmanager
def buildStringIO(strData):
    from cStringIO import StringIO
    try:
        fi = StringIO(strData)
        yield fi
    finally:
        fi.close()

Then you can do:

with buildStringIO('foobar') as f:
    print(f.read()) # will print 'foobar'
hjd
  • 29
  • 1
0

Even if

You can also switch to the newer Python 3 infrastructure offered by the io module (available in Python 2 and 3), where io.BytesIO is the more robust replacement for StringIO.StringIO / cStringIO.StringIO. This object does support being used as a context manager (but still can't be passed to open()

In Python3 , this works to me:

from pprint import pprint

from io import StringIO

import contextlib

@contextlib.contextmanager
def as_handle(handleish, mode="r", **kwargs):
    try:
        with open(handleish, mode, **kwargs) as fp:
            yield fp
    except TypeError:
        yield handleish


def processFile(filename):
    #with filename as fh:     ### OK for StringIO
        
    #with(open(filename)) as fh: #TypeError: expected str, bytes or os.PathLike                          #object, not _io.StringIO
    
    with as_handle(filename) as fh:
        return fh.readlines()   


    # This fails ## doesnt fail anymore
plink_data = StringIO('StringIO data.')
print('This is the error.')
pprint(processFile(plink_data))

output:

This is the error.
['StringIO data.']
pippo1980
  • 2,181
  • 3
  • 14
  • 30