33

I'm trying to use utf-8 characters when rendering a template with Jinja2. Here is how my template looks like:

<!DOCTYPE HTML>
<html manifest="" lang="en-US">
<head>
    <meta charset="UTF-8">
    <title>{{title}}</title>
...

The title variable is set something like this:

index_variables = {'title':''}
index_variables['title'] = myvar.encode("utf8")

template = env.get_template('index.html')
index_file = open(preview_root + "/" + "index.html", "w")

index_file.write(
    template.render(index_variables)
)
index_file.close()

Now, the problem is that myvar is a message read from a message queue and can contain those special utf8 characters (ex. "Séptimo Cine").

The rendered template looks something like:

...
    <title>S\u00e9ptimo Cine</title>
...

and I want it to be:

...
    <title>Séptimo Cine</title>
...

I have made several tests but I can't get this to work.

  • I have tried to set the title variable without .encode("utf8"), but it throws an exception (ValueError: Expected a bytes object, not a unicode object), so my guess is that the initial message is unicode

  • I have used chardet.detect to get the encoding of the message (it's "ascii"), then did the following: myvar.decode("ascii").encode("cp852"), but the title is still not rendered correctly.

  • I also made sure that my template is a UTF-8 file, but it didn't make a difference.

Any ideas on how to do this?

alex.ac
  • 1,053
  • 2
  • 9
  • 23

3 Answers3

40

TL;DR:

  • Pass Unicode to template.render()
  • Encode the rendered unicode result to a bytestring before writing it to a file

This had me puzzled for a while. Because you do

index_file.write(
    template.render(index_variables)
)

in one statement, that's basically just one line where Python is concerned, so the traceback you get is misleading: The exception I got when recreating your test case didn't happen in template.render(index_variables), but in index_file.write() instead. So splitting the code up like this

output = template.render(index_variables)
index_file.write(output)

was the first step to diagnose where exactly the UnicodeEncodeError happens.

Jinja returns unicode whet you let it render the template. Therefore you need to encode the result to a bytestring before you can write it to a file:

index_file.write(output.encode('utf-8'))

The second error is that you pass in an utf-8 encoded bytestring to template.render() - Jinja wants unicode. So assuming your myvar contains UTF-8, you need to decode it to unicode first:

index_variables['title'] = myvar.decode('utf-8')

So, to put it all together, this works for me:

# -*- coding: utf-8 -*-

from jinja2 import Environment, PackageLoader
env = Environment(loader=PackageLoader('myproject', 'templates'))


# Make sure we start with an utf-8 encoded bytestring
myvar = 'Séptimo Cine'

index_variables = {'title':''}

# Decode the UTF-8 string to get unicode
index_variables['title'] = myvar.decode('utf-8')

template = env.get_template('index.html')

with open("index_file.html", "wb") as index_file:
    output = template.render(index_variables)

    # jinja returns unicode - so `output` needs to be encoded to a bytestring
    # before writing it to a file
    index_file.write(output.encode('utf-8'))
Lukas Graf
  • 30,317
  • 8
  • 77
  • 92
  • 1
    Thanks a lot. You saved me lot of hours. – Vor Dec 11 '14 at 18:11
  • In my case I was printing to STDOUT and my error was: `File ... print template.render(context=getContent ( .... )) UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1064-1088: ordinal not in range(256)` After 30mins trying to work out the problem in the 'getContent' call I found your answer which highlights that the problem is with the print! – Richard Corden Aug 08 '16 at 10:50
  • The write fails with `TypeError: write() argument must be str, not bytes`. You're writing binary to the output file so the write mode should be `"wb"` instead of `"w"`. – BjornO Jul 22 '21 at 05:52
  • @Bjorn thanks, updated! I wrote this answer years ago in the context of Python 2, where the distinction between file modes wasn't as strict yet as it is in Python 3. – Lukas Graf Jul 22 '21 at 09:11
5

Try changing your render command to this...

template.render(index_variables).encode( "utf-8" )

Jinja2's documentation says "This will return the rendered template as unicode string."

http://jinja.pocoo.org/docs/api/?highlight=render#jinja2.Template.render

Hope this helps!

Andrew Kloos
  • 4,189
  • 4
  • 28
  • 36
-6

Add the following lines to the beginning of your script and it will work fine without any further changes:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
reload(sys)
sys.setdefaultencoding("utf-8")
asmaier
  • 11,132
  • 11
  • 76
  • 103
  • 2
    Do **not** do this. Please don't propagate this [Cargo Cult](https://en.wikipedia.org/wiki/Cargo_cult). This setting was made unavailable on the `sys` module for a reason; it is a global setting and any code that relies on implicit encoding or decoding throwing an exception for non-ASCII text **will** break with this change. That includes code in third-party libraries. – Martijn Pieters Nov 09 '16 at 16:50
  • I'm very much aware of that post and strongly disagree. Did you even see that I have an answer posted there too? – Martijn Pieters Nov 09 '16 at 22:24
  • And this is still a Cargo Cult, rolled out whenever a UnicodeEncoding exception raises its head. It is not the solution here. – Martijn Pieters Nov 09 '16 at 22:25