main.py: UnicodeDecodeError('ascii', 'blabla\xc3\xa9blabla', 31, 32, 'ordinal not in range(128)')

Question

This is a common problem where I have respected the following rules (but probably wrongly):

decode inputs
encode outputs
work in utf8 in between

Here is an excerpt of my code:

#!/usr/bin/env python
# encoding: utf-8
        m = dict()
        with io.open('test.json','r', encoding="utf-8") as f:
            m = json.load(f)
        with io.open("test.csv",'w', encoding="utf-8") as ficS:
            line = list()
            for i in m['v']:
                v = m['v']['u']
                line.append(v['label'].replace("\n", " - "))
            ficS.write(';'.join(line).encode('utf-8') + '\n')

Without .encode('utf-8'), it works, but the file is barely readable due to accentuated letters. With it, I have the following error message:

__main__.py: UnicodeDecodeError('ascii', 'blabla\xc3\xa9blabla', 31, 32, 'ordinal not in range(128)')

Here and here, it is said:

You are encoding to UTF-8, then re-encoding to UTF-8. Python can only do this if it first decodes again to Unicode, but it has to use the default ASCII codec. Don't keep encoding; leave encoding to UTF-8 to the last possible moment instead. Concatenate Unicode values instead.

Any idea please?

`line.append(v['label'].replace(u"\n", u" - "))` - once you have decoded, make sure you always use unicode strings when mutating the unicode string. Otherwise Python2 may try to to coerce between the two types. — snakecharmerb, Jul 29 '19 at 16:22
I have already tried, also in `ficS.write(u';'.join(line).encode('utf-8') + u'\n')` but without success — lalebarde, Jul 29 '19 at 16:38
Don't manually `.encode('utf8')`. Leave this work to the `ficS` filehandle. So the last line should read: `ficS.write(u';'.join(line) + u'\n')` — lenz, Jul 29 '19 at 19:09
If the accentuated letters look distorted in the resulting file, then this is most probably because MS Excel (or whatever tool you use to open) doesn't recognise UTF-8. In that case, try a different encoding for the output file, eg. `"utf-8-sig"` or `"utf16"`. — lenz, Jul 29 '19 at 19:11

__main__.py: UnicodeDecodeError('ascii', 'blabla\xc3\xa9blabla', 31, 32, 'ordinal not in range(128)')

0 Answers0

main.py: UnicodeDecodeError('ascii', 'blabla\xc3\xa9blabla', 31, 32, 'ordinal not in range(128)')