91

What exactly is StringIO used for?

I have been looking around the internet for some examples. However, almost all of the examples are very abstract. And they just show "how" to use it. But none of them show "why" and "in which circumstances" one should/will use it?

p.s. not to be confused with this question on stackoverflow: StringIO Usage which compares string and StringIo.

starball
  • 20,030
  • 7
  • 43
  • 238
Hossein
  • 40,161
  • 57
  • 141
  • 175

8 Answers8

103

It's used when you have some API that only takes files, but you need to use a string. For example, to compress a string using the gzip module in Python 2:

import gzip
import StringIO

stringio = StringIO.StringIO()
gzip_file = gzip.GzipFile(fileobj=stringio, mode='w')
gzip_file.write('Hello World')
gzip_file.close()

stringio.getvalue()
Petr Viktorin
  • 65,510
  • 9
  • 81
  • 81
  • 3
    in other words: `duck typing` :D – Abdelouahab Dec 23 '14 at 20:44
  • 2
    Since Python 3.2, the gzip module has functions compress data directly. (But any well-known open-source library that currently needs StringIO will probably grow such functions after some time, so rather than search for a new example I'll leave gzip here.) – Petr Viktorin Jan 08 '16 at 15:30
41

StringIO gives you file-like access to strings, so you can use an existing module that deals with a file and change almost nothing and make it work with strings.

For example, say you have a logger that writes things to a file and you want to instead send the log output over the network. You can read the file and write its contents to the network, or you can write the log to a StringIO object and ship it off to its network destination without touching the filesystem. StringIO makes it easy to do it the first way then switch to the second way.

nmichaels
  • 49,466
  • 12
  • 107
  • 135
  • 2
    stringIO is also helpful in writing a file directly into S3 (i.e. without the need to save locally first then upload). – 7bStan Sep 18 '19 at 08:42
20

In cases where you want a file-like object that ACTS like a file, but is writing to an in-memory string buffer: StringIO is the tool. If you're building large strings, such as plain-text documents, and doing a lot of string concatenation, you might find it easier to just use StringIO instead of a bunch of mystr += 'more stuff\n' type of operations.

jathanism
  • 33,067
  • 9
  • 68
  • 86
  • 3
    I've also found `StringIO` to be considerably faster if you are dealing with multiple megabytes of character-data when compared to expressions like `mystr += "more stuff\n"` within a loop, especially if you can use `cStringIO.StringIO` instead of just `io.StringIO`. – Seldom 'Where's Monica' Needy Oct 06 '16 at 05:02
  • 1
    @SeldomNeedy Did you benchmark it? It might have been true in 2016, but string concatenation with += is pretty optimized these days (it uses the reference counter to mutate the string in place when that's safe to do). Benchmark: `$ python3 -m timeit -s "from io import StringIO; line = 'a'*80" $'s = StringIO()\nfor i in range(10000): s.write(line)\ns = s.getvalue()'` ⇒ `500 loops, best of 5: 599 usec per loop`; `python3 -m timeit -s "line = 'a'*80" $'s = ""\nfor i in range(10000): s += line'` ⇒ `500 loops, best of 5: 588 usec per loop` – Clément Aug 01 '20 at 02:56
  • @Clément I was referring to Python 2.x; `cStringIO` does not even exist in Python 3. It's good to see the naïve implementation is well-optimized in 3.x! – Seldom 'Where's Monica' Needy Aug 07 '20 at 05:35
  • @SeldomNeedy `+=` was well-optimized in Python 2 as well, AFAICT: the `+=` version of the benchmark above is ten times as fast as StringIO and 3 times as fast as StringIO in Python2. (Also, your post mentions `io.StringIO`; isn't that Python 3 only?) – Clément Aug 07 '20 at 14:14
  • @Clément https://docs.python.org/2/library/stringio.html – Seldom 'Where's Monica' Needy Aug 10 '20 at 16:12
14

I've used it in place of text files for unit-testing.

For example, to make a csv 'file' for testing with pandas (Python 3):

import io
import pandas as pd
f = io.StringIO("id,name\n1,brian\n2,amanda\n3,zoey\n")
df = pd.read_csv(f) # pandas takes a file path or a file-like object

From the documentation here:

An in-memory stream for text I/O. The text buffer is discarded when the close() method is called.

The initial value of the buffer can be set by providing initial_value.

method getvalue(): Return a str containing the entire contents of the buffer.

Rick
  • 7,007
  • 2
  • 49
  • 79
Brian Burns
  • 20,575
  • 8
  • 83
  • 77
10

Couple of things I personally have used it for:

  1. Whole-file caching. I have a script that reads PDFs and does validation of various things about them. The PDF library I'm using takes an open file in its document constructor. I originally just opened the PDF I was interested in reading, however when I changed it to read the entire file at once into memory then pass a StringIO object to the PDF library, the running time of my script was cut in half.

  2. Deferred printing. Same script prints a header before every PDF it reads. However, I can specify on the command line whether to ignore certain tests that are in its configuration file, or to only include certain ones. If I ignore all tests for a given PDF I don't want the header printed, but I won't know how many tests I ran until I'm done running the tests (the tests can be defined dynamically as well). So I capture the header into a StringIO object by changing sys.stdout to point to it, and each time I run a test I check to see whether that object has anything in it. If so, I print it then and reset it to empty. Voila, only PDFs that have tests have headers printed.

kindall
  • 178,883
  • 35
  • 278
  • 309
9

I've just used StringIO in practice for two things:

  • To unit-test a script that does a lot of printing, by redirecting sys.stdout to a StringIO instance for easy analysis;
  • To create a guaranteed well-formed XML document (a custom API request) using ElementTree and then write it for sending via a HTTP connection.

Not that you need StringIO often, but sometimes it's pretty useful.

9000
  • 39,899
  • 9
  • 66
  • 104
3

Here's a concrete example of a use case for StringIO: writing some data directly to aws s3, without needing to create a file on local disk:

import csv
import io
import boto3

data = [
    ["test", "data", "headers etc", "123","",],
    ["blah", "123", "35", "blah","",],
    ["abc", "def", "blah", "yep", "blah"]
]

bucket_name = 'bucket_name_here'
session = boto3.Session(
    aws_access_key_id = "fake Access ID"),
    aws_secret_access_key = "fake Secret key"),
    region_name = "ap-southeast-2")
)
s3 = session.resource('s3')
with io.StringIO() as f:
    writer = csv.writer(f, delimiter=",")
    writer.writerows(data)
    resp = s3.Object(bucket_name, "test.csv").put(Body=f.getvalue())

Enjoy your new csv on S3, without having written anything to your local disk!

Hansang
  • 1,494
  • 16
  • 31
2

Django has a function call_command which is used to call management commands. This function prints output to stdout and doesn't return any value. If you want to know whether the command ran successfully or not, you have to look into output and decide.

Using StringIO, you can capture output and check if it is desired output or not.

with io.StringIO() as output:
    call_command('custom_command', stdout=output)
    if 'Success' not in output.getvalue():
        print('Custom command failed...')
Chillar Anand
  • 27,936
  • 9
  • 119
  • 136