OK so...
- a Unicode string gets encoded to a Python 2.x string (actually, a sequence of bytes)
- a Python 2.x string gets decoded to a Unicode string
Python UnicodeDecodeError - Am I misunderstanding encode?
I've got this python 2.7 code
try:
print '***'
print type(relationsline)
relationsline = relationsline.decode("ascii", "ignore")
print type(relationsline)
relationsline = relationsline.encode("ascii", "ignore")
print type(relationsline)
relations = ast.literal_eval(relationsline)
except ValueError:
return
except UnicodeDecodeError:
return
The last line in the code above sometimes throws
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)
I would think that this would (1) start with a string with some (unknown) encoding (2) decode it into a unicode type, representing a string of characters the unicode character set with ascii encodings while ignoring all characters that can't be encoded with ascii (3) encode the unicode type into a string with ascii encoding, ignoring all of the characters that can't be represented in ascii.
Here is the full stack trace:
Traceback (most recent call last):
File "outputprocessor.py", line 69, in <module>
getPersonRelations(lines, fname)
File "outputprocessor.py", line 41, in getPersonRelations
relations = ast.literal_eval(relationsline)
File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/usr/lib/python2.7/ast.py", line 37, in parse
return compile(source, filename, mode, PyCF_ONLY_AST)
File "<unknown>", line 1
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)
^
SyntaxError: invalid syntax
But that is clearly wrong somewhere. Even more perplexing is that the UnicodeDecodeError is not catching the UnicodeDecodeError. What am I missing? Maybe this is the problem? http://bugs.python.org/issue22221