7

StringIO is the file-like string buffer object we use when reading pandas dataframe from text, e.g. "How to create a Pandas DataFrame from a string?"

Which of these two imports should we use for StringIO (within pandas)? This is a long-running question that has never been resolved over four years.

  1. StringIO.StringIO (Python 2) / io.StringIO (Python 3)
    • Advantages: more stable for futureproofing code, but forces us to version-fork, e.g. see code at bottom from EmilH.
  2. pandas.compat.StringIO

Version 2/3 forking code for imports from standard (from EmilH):

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

# Note: but this is very much a poor-man's version of pandas.compat, which contains much much more

Note:

smci
  • 32,567
  • 20
  • 113
  • 146
  • cc: @Jeff jreback ... – smci May 11 '18 at 00:57
  • The lack of a standard causes both [confusion](https://stackoverflow.com/a/22605281/202229) and [breakage](https://stackoverflow.com/questions/37530891/python-pandas-nameerror-stringio-is-not-defined). – smci May 11 '18 at 01:02
  • I think this is primarily opinion based. All the approaches work so use the one that you feel most comfortable with. When I answered the referenced question I used the snippet to show what to use in both Python 3 and Python 2. Today, 4 years later I'm only using Python 3 so it's a non-issue for me. Stackoverflow is probably not the place to push for a standard on this. If it's important raise an issue on the pandas issue tracker perhaps? – Emil L May 12 '18 at 18:42
  • @EmilH: it's not opinion-based, it depends on whether the pandas developers plan to change their guidance on `pandas.compat`. We don't even need everything inside `pandas.compat` to be stable, only the identifiers I named, but in any case it has been stable since [late 2015](https://github.com/pandas-dev/pandas/commits/master/pandas/compat/__init__.py), so their warning is overly severe – smci May 12 '18 at 22:04
  • @smci I agree that today their warning is overly severe (at least for `StringIO`). But, no matter the pandas developers opinion on: **Which of these two imports should we use for StringIO (within pandas)?** the answer is still based on opinion. If the question was: **Is there an officially recommended way to use `StringIO` (within pandas)?** That would not be opinion based, but reading the docs the recommendation would still currently be to not use the `pandas.compat` (despite that being an arguably cleaner way of getting hold of `StringIO`). – Emil L May 14 '18 at 04:26
  • @EmilH: it's not based on opinion, it's based on facts. Specifically, the current and future status of `pandas.compat`, per the pandas devel crew. (Not the doc's old 'official recommendation' and the doc on that which are clearly 3+ years out-of-date). Please look at the github links for code and issues, which I cited you. Where the current code disagrees with 4-year-old doc, ignore the doc. This wouldn't be the first time that a package's doc lagged the reality of its code, or github, by years. – smci May 14 '18 at 04:29

2 Answers2

6

I know this is an old question, but I followed breadcrumbs here, so perhaps still worth answering. It's not totally definitive, but current Pandas documentation suggests using the built in StringIO rather than it's own internal methods.

For examples that use the StringIO class, make sure you import it with from io import StringIO for Python 3.

s_pike
  • 1,710
  • 1
  • 10
  • 22
  • Yes that's the answer these days. (I had meant to self-answer and close this years ago) – smci Aug 12 '21 at 01:40
4

FYI, as of pandas 0.25, StringIO was removed from pandas.compat (PR #25954), so you'll now see:

from pandas.compat import StringIO

ImportError: cannot import name 'StringIO' from 'pandas.compat'

This means the only answer is to import from the io module.

smci
  • 32,567
  • 20
  • 113
  • 146
Mike T
  • 41,085
  • 18
  • 152
  • 203