2

OK so...

  • a Unicode string gets encoded to a Python 2.x string (actually, a sequence of bytes)
  • a Python 2.x string gets decoded to a Unicode string

Python UnicodeDecodeError - Am I misunderstanding encode?

I've got this python 2.7 code

try:
    print '***'
    print type(relationsline)
    relationsline = relationsline.decode("ascii", "ignore")
    print type(relationsline)
    relationsline = relationsline.encode("ascii", "ignore")
    print type(relationsline)
    relations = ast.literal_eval(relationsline)
except ValueError:
    return
except UnicodeDecodeError:
    return

The last line in the code above sometimes throws

UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)

I would think that this would (1) start with a string with some (unknown) encoding (2) decode it into a unicode type, representing a string of characters the unicode character set with ascii encodings while ignoring all characters that can't be encoded with ascii (3) encode the unicode type into a string with ascii encoding, ignoring all of the characters that can't be represented in ascii.

Here is the full stack trace:

Traceback (most recent call last):
  File "outputprocessor.py", line 69, in <module>
    getPersonRelations(lines, fname)
  File "outputprocessor.py", line 41, in getPersonRelations
    relations = ast.literal_eval(relationsline)
  File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)
                      ^
SyntaxError: invalid syntax

But that is clearly wrong somewhere. Even more perplexing is that the UnicodeDecodeError is not catching the UnicodeDecodeError. What am I missing? Maybe this is the problem? http://bugs.python.org/issue22221

Community
  • 1
  • 1
bernie2436
  • 22,841
  • 49
  • 151
  • 244
  • your ascii text isn't. strict ascii is a 7bit charset (0x00 -> 0x7F), and you've got a char that's > 0x7F, which means it's not ascii. maybe it's extended ascii, iso8859-1, or whatever. but it's not "ascii". – Marc B Aug 29 '14 at 16:31
  • This'll be easier to debug if you can figure out an example that consistently throws the error. An [MCVE](http://stackoverflow.com/help/mcve) would be recommended: a minimal, standalone code sample that runs and produces the error you're talking about when you run it. – user2357112 Aug 29 '14 at 16:35
  • Also, please show us the exception's complete stack trace. – user2357112 Aug 29 '14 at 16:38
  • It is an ast syntax error I imagine – Padraic Cunningham Aug 29 '14 at 17:10

2 Answers2

1

Look at the stack trace closer. It is throwing a SyntaxError.

You are trying to literal_eval the string "UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)". You can encode/decode that string all you want, but ast won't know what to do with it - that's clearly not a valid python literal.

See:

>>> import ast
>>> ast.literal_eval('''UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)''')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 341: ordinal not in range(128)
                      ^
SyntaxError: invalid syntax

I would look at the source of whatever is passing these strings to your function, it's generating some bogus input.

roippi
  • 25,533
  • 4
  • 48
  • 73
0

You are trying to literal_eval the traceback from relationsline = relationsline.encode("ascii", "ignore") from the passed in string.

You will need to move your literal_eval check into its own try/except or catch the exception in your original try block or filter the input somehow.

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321