25

I'm getting a

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 34: ordinal not in range(128)

on a string stored in 'a.desc' below as it contains the '£' character. It's stored in the underlying Google App Engine datastore as a unicode string so that's fine. The cStringIO.StringIO.writelines function is trying seemingly trying to encode it in ascii format:

result.writelines(['blahblah',a.desc,'blahblahblah'])

How do I instruct it to treat the encoding as unicode if that's the correct phrasing?

app engine runs on python 2.5

citronic
  • 9,868
  • 14
  • 51
  • 74

4 Answers4

38

You can wrap the StringIO object in a codecs.StreamReaderWriter object to automatically encode and decode unicode.

Like this:

import cStringIO, codecs
buffer = cStringIO.StringIO()
codecinfo = codecs.lookup("utf8")
wrapper = codecs.StreamReaderWriter(buffer, 
        codecinfo.streamreader, codecinfo.streamwriter)

wrapper.writelines([u"list of", u"unicode strings"])

buffer will be filled with utf-8 encoded bytes.

If I understand your case correctly, you will only need to write, so you could also do:

import cStringIO, codecs
buffer = cStringIO.StringIO()
wrapper = codecs.getwriter("utf8")(buffer)
Michael Dunn
  • 8,163
  • 4
  • 37
  • 54
codeape
  • 97,830
  • 24
  • 159
  • 188
  • 1
    Also, the file-like object returned by `cStringIO.StringIO()` doesn't work in the `with` statement, but the wrapper returned by `codecs.StreamReaderWriter()` does! – steveha Oct 16 '15 at 22:07
  • This sounds similar to https://stackoverflow.com/q/45101658/562769 - do you know the answer to my question? – Martin Thoma Jul 14 '17 at 11:26
22

StringIO documentation:

Unlike the memory files implemented by the StringIO module, those provided by [cStringIO] are not able to accept Unicode strings that cannot be encoded as plain ASCII strings.

If possible, use StringIO instead of cStringIO.

Phil
  • 4,767
  • 1
  • 25
  • 21
  • 1
    I switched (cStringIO is meant to be better performance-wise) and it didn't throw an error but did print '£' instead of just '£'. Why is 'Â' showing up now? – citronic Nov 30 '09 at 03:41
  • 4
    '£' is the Windows-1252 decoding of 0xc2 0xa3 which is the UTF-8 encoding of u'£'. Maybe your terminal, app, or wherever you're seeing that is configured for Windows-1252 instead of UTF-8. – Phil Nov 30 '09 at 03:48
  • hmm. Essentially I'm looking at a web server response through Chrome browser. Would that be the issue? – citronic Nov 30 '09 at 03:53
  • In Chrome you can set the encoding under which the page will be interpreted-- Page menu -> Encoding. Select "Unicode (UTF-8)" and see if that fixes it... – Phil Nov 30 '09 at 04:00
  • default is ISO-8859-1 (western). That should be ok should it not? – citronic Nov 30 '09 at 04:12
  • 3
    Nope. ISO-8859-1 will behave the same as Windows-1252 in that regard. You probably want to explicitly set the UTF-8 encoding in your page headers so that browsers don't have to guess the encoding. (Unless, of course, something else in your app is already generating output in a non-UTF-8 encoding.) – Phil Nov 30 '09 at 04:19
4

You can also encode your string as utf-8 manually before adding it to the StringIO

for val in rows:
    if isinstance(val, unicode):
        val = val.encode('utf-8')
result.writelines(rows)
Rushabh Mehta
  • 1,463
  • 16
  • 15
0

Python 2.6 introduced the io module and you should consider using io.StringIO(), "An in-memory stream for unicode text."

In older python versions this is not optimized (pure Python), in later versions this has been optimized to (fast) C code.

Anthon
  • 69,918
  • 32
  • 186
  • 246