0

I got an UnicodeDecodeError,

'utf8' codec can't decode byte 0xe5 in position 1923: invalid continuation byte

I have use Danish letter "å" in my template. How can I solve the problem, then I can use non-English letter in my Django project and database?

hln
  • 1,071
  • 4
  • 21
  • 37
  • 1
    Are you sure your data is actually in UTF-8? – BrenBarn Apr 14 '13 at 06:28
  • 2
    Looks like iso-8859-1 instead of utf-8. – robertklep Apr 14 '13 at 06:38
  • I wonder what the underlying bytes of your template look like (like from a `hexdump`). Your character `å` is the Unicode **codepoint** `e5` but in UTF-8 the actual bytes are `c3 a5`. See: http://hexutf8.com/?q=#å if Python encounters the byte `e5` it will error, since that's not a valid UTF-8 byte sequence: http://hexutf8.com/?q=e5 – jar Feb 01 '15 at 03:42

2 Answers2

2

I can get a similar error (mentioning the same byte value) doing this:

>>> 'å'.encode('latin-1')
b'\xe5'
>>> _.decode('utf-8')
Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    _.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 0: unexpected end of data

This implies that your data is encoded in latin-1 rather than utf-8. In general, there are two solutions to this: if you have control over your input data, re-save it as UTF-8. Otherwise, when you read the data in Python, set the encoding to latin-1. For a django template, you should be able to use the first - the editor you use should have an 'encoding' option somewhere, change it to utf-8, resave, and everything should work.

lvc
  • 34,233
  • 10
  • 73
  • 98
  • Hi, lvc thanks for the answer, now i have changed encoding to utf-8 , the error disappeared, but it cannot show letters "æ,ø,å" in the borwser, it show empty squares instead. – hln Apr 14 '13 at 13:55
  • @hln check that the generated HTML isn't overriding the `charset` setting for the client (HTML/HTTP use the term 'charset' instead of 'encoding', but it is the same thing). This causes the same issue in a different place. Note that the encoding used to send data to the client doesn't have to be the same as the encoding used for input to the server code (including the template file) - all that's necessary is that the server tell the truth about what encoding it uses. If you remove any setting of it in your template code, django should do the Right Thing. – lvc Apr 15 '13 at 06:20
  • If that doesn't help, you may have better luck asking a new question (since you're now getting a different error than the one this question is about). – lvc Apr 15 '13 at 06:23
0

this helped me https://stackoverflow.com/a/23278373/2571607

Basically, open C:\Python27\Lib\mimetypes.py

replace

‘default_encoding = sys.getdefaultencoding()’

with

if sys.getdefaultencoding() != 'gbk':  
    reload(sys)  
    sys.setdefaultencoding('gbk')  
default_encoding = sys.getdefaultencoding() 
Community
  • 1
  • 1
Yue Y
  • 583
  • 1
  • 6
  • 24