Python - 'ascii' codec can't decode byte

Question

I'm using Python 2.6 and Jinja2 to create HTML reports. I provide the template with many results and the template loops through them and creates HTML tables

When calling template.render, I've suddenly started getting this error.

<td>{{result.result_str}}</td>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128)

The strange thing is, even if I set result.result_str to a simple ascii string like "abc" for every result, I am still seeing this error. I'm new to Jinja2 and Python and would appreciate any ideas on how I can go about investigating the problem to get to the root cause.

score 78 · Answer 1 · answered Feb 17 '13 at 08:41

78

Try to add this:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

It fixed my problem, good luck.

answered Feb 17 '13 at 08:41

Richard Huang

947
1
6
3

This is the best answer in the thread, saves a lot of hassle. – AlexLordThorsen Nov 12 '13 at 23:54
What is the trick with reload() ? is it sorcery ? (note: it does works but I don't understand) – Jocelyn delalande Nov 20 '13 at 10:17
I don't really understand why this works, but it does. Thanks – bgusach Jan 28 '14 at 17:19
28

This is actually **terrible advice**. The OP should instead make sure they decode byte strings to Unicode values first. Setting the default encoding is akin to keeping walking on a broken leg by strapping a stick to it instead of getting yourself to a hospital to set the bone. – Martijn Pieters May 15 '14 at 11:34
I accidentally upvoted this answer because it seemed to provide a quick fix. @MartijnPieters is right though: You should fix your bugs instead. – Christian Pietsch Jun 04 '14 at 14:06
3

If I genuinely want everything to be UTF-8, why is this wrong? – David Chouinard Sep 06 '15 at 01:50

score 43 · Answer 2 · edited Jul 03 '14 at 01:34

43

From http://jinja.pocoo.org/docs/api/#unicode

Jinja2 is using Unicode internally which means that you have to pass Unicode objects to the render function or bytestrings that only consist of ASCII characters.

So wherever you set result.result_str, you need to make it unicode, e.g.

result.result_str = unicode(my_string_variable, "utf8")

(If your bytes were utf8 encoded unicode)

or

result.result_str = u"my string"

edited Jul 03 '14 at 01:34

cbednarski

11,718
4
26
33

answered Feb 18 '11 at 11:29

Martin Stone

12,682
2
39
53

This answer helped me more than the accepted. I can agree with the advice in the accepted answer -- it would be fantastic if I could take a month and fix my 100k SLOC webapp to correctly convert strings to unicode at its boundaries and only work with unicode internally! -- but I can't follow that advice because of dollars. Knowing that Jinja2 is using unicode internally helped me recognize at what point I was having encoding problems, and write a fix to solve the production bug. Thanks, guys! – Geoff Gerrietts Jul 31 '14 at 14:39
1

Is it possible to patch Jinja2 so that it tries to decode from `utf-8` instead of `ascii`? http://stackoverflow.com/questions/28642781/hack-jinja2-to-encode-from-utf-8-instead-of-ascii – anatoly techtonik Feb 21 '15 at 06:24

score 20 · Accepted Answer · answered Feb 18 '11 at 11:27

20

If you get an error with a string like "ABC", maybe the non-ASCII character is somewhere else. In the template source perhaps?

In any case, use Unicode strings throughout your application to avoid this kind of problems. If your data source provides you with byte strings, you get unicode strings with byte_string.decode('utf-8'), if the string is encoded in UTF-8. If your source is a file, use the StreamReader class in the codecs module.

If you're unsure about the difference between Unicode strings and regular strings, read this: http://www.joelonsoftware.com/articles/Unicode.html

answered Feb 18 '11 at 11:27

jd.

10,678
3
46
55

I checked template for non ascii, in Vim I ran "set isprint=", but it didn't show anything non ascii. – shane Feb 18 '11 at 11:39
And there's no other variable, that the template would try to render after the line you showed in your post, that could contain and encoded string? If not, can you reduce your template to the bare minimum that will reproduce the error? – jd. Feb 18 '11 at 11:44
Good idea. I'll try just displaying the result_str and nothing else to see if I still get it. – shane Feb 18 '11 at 11:58

score 11 · Answer 4 · answered Sep 25 '14 at 15:42

Just encountered the same problem in a piece of code which saves output from Jinja2 to HTML files:

with open(path, 'wb') as fh:
    fh.write(template.render(...))

It's easy to blame Jinja2, although the actual problem is in Python's open() which as of version 2.7 doesn't support UTF-8. The fix is as simple as:

import codecs
with codecs.open(path, 'wb', 'utf-8') as fh:
    fh.write(template.render(...))

score 5 · Answer 5 · answered Apr 17 '11 at 13:24

5

Simple strings may contain UTF-8 character bytes but they are not of type unicode. This can be fixed by "decode" which converts str to unicode. Works in Python 2.5.5.

my_string_variable.decode("utf8")

answered Apr 17 '11 at 13:24

cat

2,871
1
23
28

score 0 · Answer 6 · answered Feb 18 '11 at 11:18

0

ASCII is a 7-bit code. The value 0xC4 cannot be stored in 7 bits. Therefore, you are using the wrong encoding for that data.

answered Feb 18 '11 at 11:18

tchrist

78,834
30
123
180

1

I understand what the error means. I'm looking for some pointers on tracking down why I am getting it. – shane Feb 18 '11 at 11:20
@shane: Because you're using `0xC4`. Find this character. Remove it. – S.Lott Feb 18 '11 at 15:21
2

@shane: It's probably worth noting that 0xc4 is the first byte of [UTF8-encoded characters between U+0100 and U+013F](http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256&unicodeinhtml=hex). – Martin Stone Feb 18 '11 at 16:16

score -1 · Answer 7 · answered Feb 20 '13 at 12:19

-1

Or you may do

export LANG='en_US.UTF-8'

in your console where you run the script.

answered Feb 20 '13 at 12:19

Zinovy Nis

455
6
9

Python - 'ascii' codec can't decode byte

7 Answers7

Linked