After that __future__
statement, your literals are not str
objects, but unicode
objects. That's the whole point of the statement. That isn't described too well, either in the __future__
docs or in PEP 3112 which they refer to (which spends most of its time talking about how to write Python 2-style bytes
objects, given that string literals are now Unicode). But that's what it does.
You can test this in the interactive interpreter:
>>> 'abc'
'abc'
>>> from __future__ import unicode_literals
>>> 'abc'
u'abc'
So, in version 2, you're adding two str
objects together, which is easy. But in version 1, you're adding a unicode
and a str
. This works by automatically converting the str
to a unicode
using the default encoding, which is ASCII, which doesn't work.
The easiest way to fix this is to make project
be a unicode
itself:
def print_project(self, project):
project_prefix = "Project: "
print (project_prefix + unicode(project))
This will, in fact, work with or without the __future__
statement—with it, project_prefix
is already unicode
; without it, it's a str
and will be decoded from ASCII, but that's fine, because it is ASCII.
If you want to use non-ASCII literals (in the project_prefix), and you want your code to work with and without the __future__
statement, you will have to manually decode:
def print_project(self, project):
project_prefix = "Project: ".decode('utf-8')
print (project_prefix + unicode(project))
(Make sure to match the source file's coding declaration, of course.)
In a comment, you ask:
when using the __future__
import statement do I still have to define the coding at the beginning of the .py file? # -- coding: utf-8 --
The short answer is yes.
I don't know if the documentation directly covers this anywhere, but if you think about it, there's no other way it could work.
In order to interpret literals in your 8-bit source code as Unicode, the Python compiler has to decode them. The only way it knows what to decode them from is your coding declaration.
Another way to look at this is that the __future__
statement makes Python 2 work like Python 3 as far as string literals are concerned, and Python 3 needs coding declarations.
If you want to test this for yourself, copy the following as UTF and paste it into a text file. (Note that you have to use an editor that doesn't understand coding declarations to do this—something like emacs may convert your UTF-8 text to Latin-1 on saving!).
# -*- coding: latin-1 -*-
from __future__ import unicode_literals
print repr('é')
When you run this, it will print out u'\xc3\xa9'
, not u'\xe9'
.
While Python 3 defaults to UTF-8 if you don't specify a coding, Python 2.5-2.7 defaults to ASCII, even with unicode_literals
. So, you still need the coding declaration. (It's always safe to add, even in 3.x, and it also makes many programmers' text editors happy, so it maybe a habit worth keeping until we get far enough into the future that nobody remembers Latin-1 and Shift-JIS and cp1250 and so on.)