1

As a French user of Python 2.7, I'm trying to properly print strings containing accents such as "é", "è", "à", etc. in the Python console.

I already know the trick of using u before the explicit value of a string, such as :

print(u'Université')

which properly prints the last character.

Now, my question is: how can I do the same for a string that is stored as a variable?

Indeed, I know that I could do the following:

mystring = u'Université'
print(mystring)

but the problem is that the value of mystring is bound to be passed into a SQL query (using psycopg2), and therefore I can't afford to store the u inside the value of mystring.

so how could I do something like "print the unicode value of mystring" ?

tripleee
  • 175,061
  • 34
  • 275
  • 318
R. Bourgeon
  • 923
  • 1
  • 9
  • 25
  • `u'...'` creates an object of type `unicode`. Your `mystring` object is **already* such an object; it's not the `print()` function that turns it into something else. – Martijn Pieters Oct 08 '18 at 09:45
  • Most SQL database adapters can give you Unicode string objects directly, no need to convert. For `str` objects (byte strings), you need to decode from bytes to Unicode. See https://nedbatchelder.com/text/unipain.html for a great article on the subject. – Martijn Pieters Oct 08 '18 at 09:46
  • Then also read https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/ and the [Python Unicode HOWTO](https://docs.python.org/2/howto/unicode.html). – Martijn Pieters Oct 08 '18 at 09:46
  • I don't really see why this is closed as too broad. We have [the same question for "raw strings"](https://stackoverflow.com/questions/21605526/how-to-create-raw-string-from-string-variable-in-python), and that one isn't closed. – Aran-Fey Oct 08 '18 at 09:48
  • does it mean that if I do mystring = u'Université' and then I send a query like "INSERT INTO mytable VALUES "+mystring+";" the value passed to SQL will be understood as 'Université'? – R. Bourgeon Oct 08 '18 at 09:49

1 Answers1

2

The u sigil is not part of the value, it's just a type indicator. To convert a string into a Unicode string, you need to know the encoding.

unicodestring = mystring.decode('utf-8')  # or 'latin-1' or ... whatever

and to print it you typically (in Python 2) need to convert back to whatever the system accepts on the output filehandle:

print(unicodestring.encode('utf-8'))  # or 'latin-1' or ... whatever

Python 3 clarifies (though not directly simplifies) the situation by keeping Unicode strings and (what is now called) bytes objects separate.

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • print(mystring.decode('utf-8')) works like a charm for displaying accented characters properly in the console. Thanks. – R. Bourgeon Oct 08 '18 at 09:52