14

I can't seem to find what's the default encoding for io.StringIO in Python3. Is it the locale as with stdio?

How can I change it?

With stdio, seems that just reopening with correct encoding works, but there's no such thing as reopening a StringIO.

Giovanni Funchal
  • 8,934
  • 13
  • 61
  • 110

2 Answers2

15

The class io.StringIO works with str objects in Python 3. That is, you can only read and write strings from a StringIO instance. There is no encoding -- you have to choose one if you want to encode the strings you got from StringIO in a bytes object, but strings themselves don't have an encoding.

(Of course strings need to be internally represented in some encoding. Depending on your interpreter, that encoding is either UCS-2 or UCS-4, but you don't see this implementation detail when working with Python.)

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Quoting "There is no encoding": What if I do `stdout = io.StringIO()`, then which encoding standard `print()` will have? – Giovanni Funchal Feb 20 '12 at 21:51
  • @GiovanniFunchal: Still none, since all strings you write there are still strings. They are not encoded. – Sven Marnach Feb 20 '12 at 21:52
  • 1
    So if I use a `# coding:utf-8` in the beginning, encoding of `print("foo")` in the buffer will be `utf-8`? – Giovanni Funchal Feb 20 '12 at 21:58
  • @GiovanniFunchal: You are mixing concepts here. `# coding:utf-8` describes the encoding of your source file. String literals in your source file are encoded in UTF-8, but will be decoded to string objects while the program is loaded. If you write such a string object to a `StringIO` instance, it will still be a string. Again, *strings* themselves don't have an encoding in Python 3. – Sven Marnach Feb 20 '12 at 22:01
  • So if I copy the value of that `StringIO` object to `stdout`, I should call `decode(stdout.encoding)`? – Giovanni Funchal Feb 20 '12 at 22:08
  • 1
    @GiovanniFunchal: No, when you write a string to `stdout`, it will be encoded with `stdout`'s encoding anyway, regardless whether the string comes out of a `StringIO` or from any other source. (I suggest to read the [Python 3 Unicode Howto](http://docs.python.org/release/3.0.1/howto/unicode.html) to clear up your misconceptions.) – Sven Marnach Feb 20 '12 at 22:14
  • Thanks for the answers. If I understand, in that case if I want a particular encoding I should set the encoding of `stdout` then. – Giovanni Funchal Feb 21 '12 at 09:31
6

As already mentioned in another answer, StringIO saves (unicode) strings in memory and therefore doesn't have an encoding. If you do need a similar object with encoding you might want to have a look at BytesIO.

If you want to set the encoding of stdout: You can't. At least not directly since sys.stdout.encoding is write only and (often) automatically determined by Python. (Doesn't work when using pipes) If you want to write byte strings with a certain encoding to stdout, then you either just encode the strings you print with the correct encoding (Python 2) or use sys.stdout.buffer.write() (Python 3) to send already encoded byte strings to stdout.

JonnyJD
  • 2,593
  • 1
  • 28
  • 44
  • The other answer was already valid, but I not only wanted to tell that there is no such thing, but also "how to fix it". – JonnyJD Oct 13 '13 at 12:27
  • Thanks! This was what I was looking for to replace `StringIO.StringIO` while maintaining `bytes` output. – Cas Jun 05 '17 at 21:13