1

In windows terminal I run python and execute:

[ord(i) for i in u'Йож'] # three_national_symbols

And output was

[63, 63, 63]

Could anyone explain me who's replaced my symbols with question mark and why? (Windows 10, python 2.7.12)

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
emonkey
  • 21
  • 1
  • @Martijn, it's partially a CPython issue. Historically it hasn't been well-integrated with the Windows console. For reading it used C `fread` and POSIX `read`, which aren't compatible with Unicode in the Windows console. They're implemented via `ReadConsoleA` and `ReadFile`, for which the console has to lossily encode (thus the question marks) its Unicode input buffer buffer to the current input codepage, which defaults to the system locale OEM codepage. Starting with version 3.6, Python directly calls `ReadConsoleW` to support Unicode. Old versions can use `win_unicode_console`. – Eryk Sun Jun 19 '17 at 12:04
  • @eryksun: I'm quite aware of the 3.6 improvements here. :-) But in Python 2 that's not applicable, so we need to see the terminal configuration, and the Python code to reproduce this. – Martijn Pieters Jun 19 '17 at 12:05
  • @MartijnPieters, there is no terminal configuration that can fix this in general because the console's implementation of codepage 65001 (UTF-8) is horribly buggy (it can't read non-ASCII characters because conhost.exe assumes 1 byte per character when calling `WideCharToMultiByte`). Also, it has nothing to do with fonts. The OP is simply reading that line from `stdin`. The only simple fix is to install and enable `win_unicode_console`, which will call `ReadConsoleW`. – Eryk Sun Jun 19 '17 at 12:09
  • @eryksun: sure, so we can dupe this to the canonical if and when we have the full reproducible case. – Martijn Pieters Jun 19 '17 at 12:12
  • @MartijnPieters, it's already a reproducible case. If someone enters `[ord(i) for i in u'Йож']` in the Python 2 REPL on Windows and gets the result `[63, 63, 63]`, it necessarily means the input codepage doesn't map those characters. – Eryk Sun Jun 19 '17 at 12:17
  • @eryksun: Ah, right, I misread that part. It's the **input** that's the issue, not printing. – Martijn Pieters Jun 19 '17 at 12:19
  • @eryksun: I'm more than fine with that option. – Martijn Pieters Jun 19 '17 at 12:25
  • @eryksun, Martijn, thanks a lot – emonkey Jun 19 '17 at 12:45

0 Answers0