1

I want to print a set of Unicode characters to my command prompt terminal. Even when I enforce the encoding to be "UTF-8" the terminal prints some garbage.

$python -c "import sys; print sys.stdout.write(u'\u2044'.encode('UTF-8'))"
ΓüäNone

$python -c "import sys; print sys.stdout.encoding"
cp437

My default terminal encoding is cp437 and I am trying to override that. The expected output here is Fraction slash ( ⁄ )

http://www.fileformat.info/info/unicode/char/2044/index.htm

The same piece of code works flawlessly in my Mac terminal and it uses UTF-8 as default encoding. Is there a way to display this on Windows as well? The font I use on windows command prompt is consolas.

I want my code to work with any Unicode characters, not just this particular example since the input is a web query result and I have no control over it.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
Benny
  • 639
  • 3
  • 11
  • 25
  • there is something in the back of my head telling me that UTF-8 and Windows Terminal won't work easily – Jonas Schäfer Sep 08 '12 at 11:27
  • I am already close to giving up after going through this bug http://bugs.python.org/issue1602 – Benny Sep 08 '12 at 12:01
  • You can find another terminal program to work from, instead of Windows 'cmd or whatever. I've heard that one can install mingw and have a half-working terminal in there. Otherwiser, just install a virtual machinne and set up a proper Linux environment for our development stuff. – jsbueno Sep 08 '12 at 15:18

2 Answers2

6

Python cannot control the encoding used by your terminal; you'll have to change that somewhere else.

In other words, just because you force python to output UTF-8 encoded text to the terminal, does not mean your terminal will magically start to accept that output as UTF-8 as well.

The Mac OS X terminal has already been configured to work with UTF-8.

On Windows, you can switch the console codepage with the chcp command:

chcp 65001

where 65001 is the Windows codepage for UTF-8. See Unicode characters in Windows command line - how?

Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I just tried this as well: `$chcp 65001 Active code page: 65001 $python -c "import sys; print sys.stdout.write(u'\u2044'.encode('UTF-8'))" ���None` – Benny Sep 08 '12 at 11:52
  • @Benny: Why not simply call `print(u'\u2044')`? And what does `sys.stdout.encoding` give you? `print` will encode automatically to that latter encoding for you. The linked Stack Overflow question tells you also to switch fonts for the console. – Martijn Pieters Sep 08 '12 at 11:59
  • 1
    Oh, that is where I actually started and this time python itself couldn't print since it was trying to print Unicode character using cp437 encoding which is a 8-bit code point `>>> print(u'\u2044') Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\u2044' in position 0: character maps to ` – Benny Sep 08 '12 at 12:04
  • @Benny: Right, and that's where you have to find a way to force your terminal to accept UTF-8. If `dhcp 65001` doesn't work for you, plus switching the font, I don't know what will. – Martijn Pieters Sep 08 '12 at 12:11
3

You have to use a UTF-8 code page (cp65001) to expect UTF-8 encoded text to display.

Python 3.3 claims to support code page 65001 (UTF-8) on Windows.

C:\>chcp 65001
Active code page: 65001

C:\>python
Python 3.3.0rc1 (v3.3.0rc1:8bb5c7bc46ba, Aug 25 2012, 13:50:30) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print('\u2044')
⁄

Although it is buggy:

>>> print('\u2044')
⁄

>>> print('\u2044'*8)
⁄⁄⁄⁄⁄⁄⁄⁄
��⁄⁄⁄⁄
⁄⁄
��

>>> print('1\u20442 2\u20443 4\u20445')
1⁄2 2⁄3 4⁄5
⁄5
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251