There is a significant problem with some of the answers posted so far: unicode()
decodes from the default encoding, which is often ASCII; in fact, unicode()
tries to make "sense" of the bytes it is given by converting them into characters. Thus, the following code, which is essentially what is recommended by previous answers, fails on my machine:
# -*- coding: utf-8 -*-
author = 'éric'
print '{0}'.format(unicode(author))
gives:
Traceback (most recent call last):
File "test.py", line 3, in <module>
print '{0}'.format(unicode(author))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
The failure comes from the fact that author
does not contain only ASCII bytes (i.e. with values in [0; 127]), and unicode()
decodes from ASCII by default (on many machines).
A robust solution is to explicitly give the encoding used in your fields; taking UTF-8 as an example:
u'{0} in {1}'.format(unicode(self.author, 'utf-8'), unicode(self.publication, 'utf-8'))
(or without the initial u
, depending on whether you want a Unicode result or a byte string).
At this point, one might want to consider having the author
and publication
fields be Unicode strings, instead of decoding them during formatting.