Unicode escape won't work with some characters

Question

I have a program in which I want to use some Unicode characters, such as µ and subscript p. When I do this,

print u"\xb5"

it works perfectly, but when I do this,

print u"\u209A"

I get this error message:

Traceback (most recent call last):
  File "C:/Users/tech/Desktop/Circuit Design Tool/Test 2.py", line 1, in <module>
    print u"\u209A"
  File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u209a' in position 0: character maps to <undefined>

Why is this happening? Are these the correct unicode escapes?

What are you using for a console? When I use the Windows command prompt I get `cp437` which fails similarly, and when I use Idle I get `utf-8` which doesn't generate an error but doesn't print the proper character either. — Mark Ransom, Jul 30 '15 at 21:01

score 1 · Answer 1 · answered Jul 31 '15 at 11:17

The Windows Console simply doesn't support Unicode for applications using the C standard library I/O functions (like Python does).

Whilst in principle you can, as the other comments suggest, change code page to 65001 (and set the PYTHONIOENCODING environment variable to utf-8 to match), in practice there are some long-standing bugs in the Console host's support for this code page such that you may get double-prints or hangs when trying to use it. This is typically unusable.

The reliable way to get Unicode out of the Windows Console (well, as reliable as you get—the user still has to have chosen a TTF font to stand any chance of seeing it) is to call the Win32 WriteConsoleW/ReadConsoleW functions directly instead of relying on the C stdlib. If you really need to do this, the win_unicode_console package will wrap it up for you.

(Typically a simpler option is to give up on the Windows Console and use some other environment like an IDE.)

To be clear, I eventually would like to use this in an IDE, not just make it work in the console. Would the package you linked help me there too? — jmcampbell, Jul 31 '15 at 15:40
win_unicode_console shouldn't do anything in environments other than the Windows Console. IDEs with their own REPLs should generally support Unicode naturally without anything special needing to be done (although some have had bugs in the past). — bobince, Jul 31 '15 at 16:47

score 0 · Answer 2 · edited May 23 '17 at 12:21

0

That's because of that the default encoding of your console is cp1252 and it can not decode your Unicode. Instead you need another proper encoding like utf-8.

Since the default encoding of my terminal is utf-8 it prints it correctly :

>>> print u"\u209A"
ₚ

But if I use encoding cp1252 it will raise an error, like what you got :

>>> u"\u209A".encode('cp1252')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/encodings/cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character '\u209a' in position 0: character maps to <undefined>
>>>

You can change your default encoding to utf8 using following command in Windows :

chcp 65001

OR you can also change it graphically check this question for more info : Unicode characters in Windows command line - how?

edited May 23 '17 at 12:21

Community

1
1

answered Jul 30 '15 at 18:04

Mazdak

105,000
18
159
188

That's not the character I wanted; I wanted a subscript p. Do I need to use a different unicode escape? – jmcampbell Jul 30 '15 at 18:20
@jmcampbell What you mean by `unicode escape`? do you mean another unicode encoding? – Mazdak Jul 30 '15 at 18:24
I mean the unicode string. For example, u"\xb5" is the python unicode escape for the Greek letter mu. u"\u209A" should be the unicode escape for subscript p, but it doesn't give the right character. – jmcampbell Jul 30 '15 at 18:28
I tried to print u"\u209A".encode('utf-8'), and it didn't raise an error, but it printed this: â‚š – jmcampbell Jul 30 '15 at 18:30
@jmcampbell What about if you change your default encoding? do `import sys reload(sys) sys.setdefaultencoding('UTF8') ` and then just print it without any encoding. – Mazdak Jul 30 '15 at 18:36
That didn't change anything. – jmcampbell Jul 30 '15 at 18:38
@jmcampbell What about if you do `chcp 65001` then print it – Mazdak Jul 30 '15 at 18:43
@jmcampbell Read this question for more info http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how – Mazdak Jul 30 '15 at 18:44
I put that into the command prompt, and it said `Active code page: 65001`, but when I tried to print it, it said `LookupError: unknown encoding cp65001` – jmcampbell Jul 30 '15 at 18:52
@jmcampbell Whats the result of `sys.stdin.encoding` in your consul? – Mazdak Jul 30 '15 at 18:54
`'sys.stdin.encoding' is not recognized as an internal or external command, operable program or batch file.` – jmcampbell Jul 30 '15 at 18:58
@jmcampbell Where did you run it? – Mazdak Jul 30 '15 at 19:01
That's a python command you need to run it in a python IDE. anyway I think you problem is because of your OS encoding that I think you can solve it by changing the default encoding pls check the question that I have putted its link on answer. – Mazdak Jul 30 '15 at 19:07

score 0 · Answer 3 · answered Jul 30 '15 at 18:06

0

To set the command prompt in windows to be able to show utf-8 strings , use chcp command (for utf-8 do - chcp 65001 ) -

chcp 65001

For other such encodings and their corresponding code pages (cp) , check it out here.

answered Jul 30 '15 at 18:06

Anand S Kumar

88,551
18
188
176

Unicode escape won't work with some characters

3 Answers3