5

I use a python library that prints out a Unicode character to windows console. If I call a function on the library that prints out Unicode character, it will throw an exception 'charmap' codec can't encode characters.

So this is what I tried to solve that error: Call "chcp 65001" windows console command from python using os.system("chcp 65001") before calling the library function.

I know there are questions similar to this and that is why I tried the above solution. It successfully calls the command on the console and tells me that it activated the code page.

However, the exception showed up again.

If I try to run the program again without closing the previous console, the program executes successfully without any exception. Which means the above console command takes effect after the first try.

My question is: is there a way to launch windows console by pre-activating Unicode support so that I don't have to call the program twice.

ash
  • 2,902
  • 3
  • 19
  • 34
  • You can't solve this problem with `chcp`, use the solutions listed here: [Python, Unicode, and the Windows console](http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) – roeland Mar 07 '17 at 03:45

2 Answers2

7

Add /k chcp 65001 to the shortcut launching the cmd window. Alternatively, use Python 3.6 which uses Windows Unicode APIs to write to the console and ignores the code page. You do still need font support for what you are printing, however.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    The alternative is to use the `win_unicode_console` module, not to use codepage 65001 -- at least not until Microsoft fixes the console (conhost.exe) to work properly with UTF-8. Even in Windows 10, only ASCII (0-127) can be read from the console when using codepage 65001, and in Windows 7 (fixed in 8+) all programs that rely on `WriteFile` returning the number of bytes written, instead of the number of decoded UTF-16 codes written (in the BMP, 1-3 UTF-8 bytes map to 1 UTF-16 code), end up making several writes, which leaves a stream of garbage after every write that has non-ASCII characters. – Eryk Sun Mar 06 '17 at 21:45
  • 2
    A common misconception in Windows users is calling the console a "cmd window". The cmd *shell* has nothing inherently to do with the console window. It can be run detached without a console. If it is attached to a console, then it's just a regular client process that's no different from python.exe, powershell.exe, doskey.exe, chcp.com, or mode.com (".COM" is a legacy extension; they're 64-bit PE binaries). – Eryk Sun Mar 06 '17 at 21:50
  • 1
    @Mark like you suggested, I used Python 3.6 and it worked like a charm. Thanks – ash Mar 07 '17 at 07:45
  • >"You do still need font support for what you are printing" How do I determine which font to use? Some of my emojis print out as the box replacement character. – JDOaktown Mar 15 '19 at 19:31
1

Next settings works on Windows 8.1:

==> set "PYTHONIOENCODING=UTF-8"

==> chcp 65001
Active code page: 65001

==> type "%APPDATA%\Python\Python35\site-packages\usercustomize.py"
import win_unicode_console
win_unicode_console.enable()

Test:

==> python
Python 3.5.1 (v3.5.1:37a07cee5969, Dec  6 2015, 01:54:25) [MSC v.1900 64 bit (AMD64)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> print (u'ěščřžýáíé ;ςερτυ яшертю ğüşi')
ěščřžýáíé ;ςερτυ яшертю ğüşi
>>>

Strings in test (senseless, just for demonstration):

  • ěščřžýáíé Latin, Central European
  • ;ςερτυ    Greek
  • яшертю    Cyrillic
  • ğüşi      Latin, Turkish
JosefZ
  • 28,460
  • 5
  • 44
  • 83