Update :
I found the answer here : Python UnicodeDecodeError - Am I misunderstanding encode?
I needed to explicitly decode my incoming file into Unicode when I read it. Because it had characters that were neither acceptable ascii nor unicode. So the encode was failing when it hit these characters.
Original Question
So, I know there's something I'm just not getting here.
I have an array of unicode strings, some of which contain non-Ascii characters.
I want to encode that as json with
json.dumps(myList)
It throws an error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 13: ordinal not in range(128)
How am I supposed to do this? I've tried setting the ensure_ascii parameter to both True and False, but neither fixes this problem.
I know I'm passing unicode strings to json.dumps. I understand that a json string is meant to be unicode. Why isn't it just sorting this out for me?
What am I doing wrong?
Update : Don Question sensibly suggests I provide a stack-trace. Here it is. :
Traceback (most recent call last):
File "importFiles.py", line 69, in <module>
x = u"%s" % conv
File "importFiles.py", line 62, in __str__
return self.page.__str__()
File "importFiles.py", line 37, in __str__
return json.dumps(self.page(),ensure_ascii=False)
File "/usr/lib/python2.7/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 204, in encode
return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb4 in position 17: ordinal not in range(128)
Note it's python 2.7, and the error is still occurring with ensure_ascii=False
Update 2 : Andrew Walker's useful link (in comments) leads me to think I can coerce my data into a convenient byte format before trying to json.encode it by doing something like :
data.encode("ascii","ignore")
Unfortunately that is throwing the same error.