unicodecsv reader from unicode string not working?

Question

I'm having trouble reading in a unicode CSV string into python-unicodescv:

>>> import unicodecsv, StringIO
>>> f = StringIO.StringIO(u'é,é')
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/guy/test/.env/lib/python2.7/site-packages/unicodecsv/__init__.py", line 101, in next
    row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I'm guessing it's an issue with how I convert my unicode string into a StringIO file somehow? The example on the python-unicodecsv github page works fine:

>>> import unicodecsv
>>> from cStringIO import StringIO
>>> f = StringIO()
>>> w = unicodecsv.writer(f, encoding='utf-8')
>>> w.writerow((u'é', u'ñ'))
>>> f.seek(0)
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> row = r.next()
>>> print row[0], row[1]
é ñ

Trying my code with cStringIO fails as cStringIO can't accept unicode (so why the example works, I don't know!)

>>> from cStringIO import StringIO
>>> f = StringIO(u'é')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128)

I'm need to accept a UTF-8 CSV formatted input from a web textarea form field, hence can't just read in from a file.

Any ideas?

Martijn Pieters · Accepted Answer · 2014-01-31T12:10:56.253

The unicodecsv file reads and decodes byte strings for you. You are passing it unicode strings instead. On output, your unicode values are encoded to bytestrings for you, using the configured codec.

In addition, cStringIO.StringIO can only handle encoded bytestrings, while the pure-python StringIO.StringIO class happily treats unicode values as if they are byte strings.

The solution is to encode your unicode values before putting them into the StringIO object:

>>> import unicodecsv, StringIO, cStringIO
>>> f = StringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']
>>> f = cStringIO.StringIO(u'é,é'.encode('utf8'))
>>> r = unicodecsv.reader(f, encoding='utf-8')
>>> next(r)
[u'\xe9', u'\xe9']

Excellent. Great answer and quick. gotta love SO :) – Guy Bowden Jan 31 '14 at 12:14 — Guy Bowden, Jan 31 '14 at 12:14

unicodecsv reader from unicode string not working?

1 Answers1

Linked