172

I want to make a dictionary where English words point to Russian and French translations.

How do I print out unicode characters in Python? Also, how do you store unicode chars in a variable?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
NoobDev4iPhone
  • 5,531
  • 10
  • 33
  • 33
  • Does this help: http://docs.python.org/howto/unicode.html – paulsm4 May 13 '12 at 05:07
  • Have a look [here](http://docs.python.org/reference/lexical_analysis.html#string-literals). Prefixing your strings with `u` allows python to consider them as unicode string literals. – S.R.I May 13 '12 at 05:07

10 Answers10

163

To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string. In Python 2.x, you also need to prefix the string literal with 'u'.

Here's an example running in the Python 2.x interactive console:

>>> print u'\u0420\u043e\u0441\u0441\u0438\u044f'
Россия

In Python 2, prefixing a string with 'u' declares them as Unicode-type variables, as described in the Python Unicode documentation.

In Python 3, the 'u' prefix is now optional:

>>> print('\u0420\u043e\u0441\u0441\u0438\u044f')
Россия

If running the above commands doesn't display the text correctly for you, perhaps your terminal isn't capable of displaying Unicode characters.

These examples use Unicode escapes (\u...), which allows you to print Unicode characters while keeping your source code as plain ASCII. This can help when working with the same source code on different systems. You can also use Unicode characters directly in your Python source code (e.g. print u'Россия' in Python 2), if you are confident all your systems handle Unicode files properly.

For information about reading Unicode data from a file, see this answer:

Character reading from file in Python

Matt Ryall
  • 9,977
  • 5
  • 24
  • 20
  • You don't necessarily need to escape them. Define the proper file encoding via a comment on the top line and run `print u'Россия'` – Blender May 13 '12 at 05:14
  • 4
    Yeah, you _can_ write your code in Unicode-encoded text files, but a lot of editors and tools have trouble dealing with them. My experience with working with source code on lots of different platforms has been that it's best to keep source code in ASCII and use Unicode escapes. – Matt Ryall May 13 '12 at 05:37
  • 3
    @MattRyall, I agree, but a team of Russian developers may want to write comments and docstrings in Russian. For a language project it's a good option. – Johan Lundberg May 13 '12 at 05:57
  • 3
    Though note that this only works if you print just the string. If it's wrapped in some other object you'll see escape codes. Try "print [u'\u0420\u043e\u0441\u0441\u0438\u044f']" for example. – btubbs Mar 26 '14 at 23:32
  • 3
    What if I stored it into a string `mystr`? then how to print it? – ZK Zhao Jul 15 '15 at 13:29
  • What @cqcn1991 said; all answers just print string literals. My problem is how to print a string that is in a variable. It gives me the error "exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u2014' in position 30: ordinal not in range(128)" – Carlo Wood Mar 29 '16 at 21:06
  • 1
    @CarloWood The [top answer](http://stackoverflow.com/questions/10569438/how-to-print-unicode-character-in-python/10569468#10569468) tells you exactly what you want. Just `print your_unicode_characters.encode('utf-8')` – Yuhao Zhang Sep 18 '16 at 01:57
  • The correct spelling is print(u'россия') – vhula Apr 03 '23 at 14:24
52

Print a unicode character in Python:

Print a unicode character directly from python interpreter:

el@apollo:~$ python
Python 2.7.3
>>> print u'\u2713'
✓

Unicode character u'\u2713' is a checkmark. The interpreter prints the checkmark on the screen.

Print a unicode character from a python script:

Put this in test.py:

#!/usr/bin/python
print("here is your checkmark: " + u'\u2713');

Run it like this:

el@apollo:~$ python test.py
here is your checkmark: ✓

If it doesn't show a checkmark for you, then the problem could be elsewhere, like the terminal settings or something you are doing with stream redirection.

Store unicode characters in a file:

Save this to file: foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

Run it and pipe output to file:

python foo.py > tmp.txt

Open tmp.txt and look inside, you see this:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

Thus you have saved unicode e with a obfuscation mark on it to a file.

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • @ofer.sheffer bizarrely I'm here looking to solve the opposite problem, the point being it may take some fiddling. – Chris H May 29 '15 at 10:28
47

If you're trying to print() Unicode, and getting ascii codec errors, check out this page, the TLDR of which is do export PYTHONIOENCODING=UTF-8 before firing up python (this variable controls what sequence of bytes the console tries to encode your string data as). Internally, Python3 uses UTF-8 by default (see the Unicode HOWTO) so that's not the problem; you can just put Unicode in strings, as seen in the other answers and comments. It's when you try and get this data out to your console that the problem happens. Python thinks your console can only handle ascii. Some of the other answers say, "Write it to a file, first" but note they specify the encoding (UTF-8) for doing so (so, Python doesn't change anything in writing), and then use a method for reading the file that just spits out the bytes without any regard for encoding, which is why that works.

Tom Hundt
  • 1,694
  • 19
  • 18
21

In Python 2, you declare unicode strings with a u, as in u"猫" and use decode() and encode() to translate to and from unicode, respectively.

It's quite a bit easier in Python 3. A very good overview can be found here. That presentation clarified a lot of things for me.

Michael Currie
  • 13,721
  • 9
  • 42
  • 58
Gort the Robot
  • 2,329
  • 16
  • 21
  • 1
    Thx for the video link. It is very useful. – arun Jun 16 '15 at 22:25
  • 1
    This is also available as a non-video here: Pragmatic Unicode, or, How do I stop the pain? (Pycon2012) https://nedbatchelder.com/text/unipain.html – Tom Hundt May 15 '17 at 21:38
11

Replace '+' with '000'. For example, 'U+1F600' will become 'U0001F600' and prepend the Unicode code with "\" and print. Example:

>>> print("Learning : ", "\U0001F40D")
Learning :  
>>> 

Check this maybe it will help python unicode emoji

bl3ssedc0de
  • 780
  • 1
  • 11
  • 15
9

Considering that this is the first stack overflow result when google searching this topic, it bears mentioning that prefixing u to unicode strings is optional in Python 3. (Python 2 example was copied from the top answer)

Python 3 (both work):

print('\u0420\u043e\u0441\u0441\u0438\u044f')
print(u'\u0420\u043e\u0441\u0441\u0438\u044f')

Python 2:

print u'\u0420\u043e\u0441\u0441\u0438\u044f'
Evan
  • 2,120
  • 1
  • 15
  • 20
  • Thank you! Exactly what I searched for: an universal way to print an unicode character inside a string both for python2 and python3. – JenyaKh Jul 14 '19 at 11:29
  • the clamped version should work in Phyton 2 as well - clamps are an option and thus allowed. – Alexander Stohr Nov 12 '19 at 10:38
7

Python has support for \N as a named unicode charactrer, which can be handy if you want to make your code more readable. Here's an example:

assert '\N{snake}' == ''
rg7
  • 336
  • 3
  • 4
  • The names and their support by different Python versions are listed here: https://stackoverflow.com/questions/30302766/list-of-unicode-character-names – root Jun 26 '23 at 21:01
5

Just one more thing that hasn't been added yet

In Python 2, if you want to print a variable that has unicode and use .format(), then do this (make the base string that is being formatted a unicode string with u'':

>>> text = "Université de Montréal"
>>> print(u"This is unicode: {}".format(text))
>>> This is unicode: Université de Montréal
Sheshank S.
  • 3,053
  • 3
  • 19
  • 39
4

I use Portable winpython in Windows, it includes IPython QT console, I could achieve the following.

>>>print ("結婚")
結婚

>>>print ("おはよう")
おはよう

>>>str = "結婚"


>>>print (str)
結婚

your console interpreter should support unicode in order to show unicode characters.

3

This fixes UTF-8 printing in python:

UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
Nathan B
  • 1,625
  • 1
  • 17
  • 15