UnicodeDecodeError, Invalid continuation byte again

Question

I can't figure out how to solve these problems once for all. I first encountered these problems when I tried to write "è" (i'm Italian). After some research, I found out that adding "#coding: utf-8" at the very beginning seemed to solve the problem....UNTIL NOW.

I edited a code wrote 1 or 2 days ago..it worked perfectly.... now, whenever i run the script, it doesn't work: it never starts, and I'm stuck with this error:

SyntaxError: 'utf-8' codec can't decode byte 0xe0 in position 32: invalid continuation byte.

The problem is... position 32? Where? what's the problematic line? I don't know exactly what I added, because I made a couple of changes. Trying to execute in debug mode doesn't help either, when I "Step Into" at the very beginning of the script, the error shows up immediately (by the way, i'm using Wingware 101 as an IDLE,I'm on Win7). Sorry if I don't have enough information, I could post the code but I'm afraid to do so, it's a mess written in Italian, maybe it could be not easy to understand exactly what's going on.

Thank you for replies and happy holidays!

Well, I tried to delete the line "#coding: utf-8". Now when I tried to run the script the program throwed at me a bunch of unicode errors, but luckily, now i have some informations about the lines. The problem relies on using "à" or "è" within some comments (I'm pretty sure that 0xe0 it's indeed "à"). I got rid of those pesky characters and now it works. But now I have to rely on using " a' " instead of " à ", it's still a little annoying...damn unicode errors. I hate them. — Russell Teapot, Dec 25 '15 at 08:03
Reading this reference might help you: https://docs.python.org/3/howto/unicode.html#python-s-unicode-support. Basically, the encoding you added is not in the correct form (I don't know if that's an issue). Since 3.0 Python supports unicode by default, and I'm sure, that Italian special characters are in unicode as it was designed for word domination: http://stackoverflow.com/a/2709023/2419215. You might consider switching to English comments, as that's the easiest way to go.. — fodma1, Dec 25 '15 at 08:46
0xe0 is not valid UTF-8, so you should not use that declaration. Use the correct charset instead. — Ignacio Vazquez-Abrams, Dec 25 '15 at 09:05
We need to see some code, the exception and the stacktrace. Without it, it's impossible to help you — Alastair McCormack, Dec 26 '15 at 17:12
Thank you so much for all the tips! now I'm starting to get a grasp on this whole encoding "thing" — Russell Teapot, Dec 27 '15 at 15:13

score 5 · Accepted Answer · answered Dec 27 '15 at 00:36

5

#coding: utf8 is a declaration that the source code is saved in UTF-8. Make sure that is actually the encoding of the source file. For example, the following file was created in Windows Notepad and saved as "ANSI", which on US Windows is the Windows-1252 encoding:

#coding: utf8
print('hàllo')

It produces the following error on Python 2.7:

  File "test.py", line 2
SyntaxError: 'utf8' codec can't decode byte 0xe0 in position 8: invalid continuation byte

As you can see, then 8th position (counting from 0) of line 2 is à, which in Windows-1252 is byte 0xe0. The wrong encoding is used and the error message is clear.

Either declare the correct encoding for your source file, or re-save the source file in UTF-8.

Note: I don't have Python 3.4 installed, but Python 3.5 gives a less clear error message:

  File "x.py", line 1
SyntaxError: encoding problem: utf8

It doesn't match your error message, though, but still indicates the file is not declared with the right encoding.

answered Dec 27 '15 at 00:36

Mark Tolonen

166,664
26
169
251

Oh, now I see! What was strange for me was the position without indication of the line...didn't thought that was referring to the n-th byte in the file. – Russell Teapot Dec 27 '15 at 15:12
So, if I get this right, the problem is related on how Wingware 101 encodes the ".py" files. If WW uses an encoding different from utf8, then i get this error.. so that's why I had no problems by using "à" in a script written in the python bundled IDLE... I guess – Russell Teapot Dec 27 '15 at 15:18
1

I finally found the problem: it was indeed Wingware's fault. By default WW uses cp1252, I switched to UTF-8 and now it works properly – Russell Teapot Jan 05 '16 at 04:59
1

You could have used `#coding:cp1252` instead of changing the default, but UTF-8 supports all Unicode codepoints so is a better choice. Also, Python 3 assumes a file is UTF-8-encoded unless told otherwise with the `#coding` line, so IDLE must use UTF-8 if there was no encoding declared. I don't normally use IDLE, but at a quick glance I didn't see any option to change the encoding. I use PythonWin (bundled with the pywin32 module), and it automatically saves in the encoding declared, which is handy. – Mark Tolonen Jan 05 '16 at 05:48

UnicodeDecodeError, Invalid continuation byte again

1 Answers1