unicode output not showing in windows console

Question

To display subscript as output I used Unicode for that in C++ code. e.g. u2080 for subscript 0, however, output in windows console shows some strange characters on console screen while in ubuntu terminal it shows exact subscript 0.

why unicode are not showing up in windows console.

Because the windows console by default uses a "codepage" other than unicode. You could configure it to use UTF-8 ([related](https://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how)). — , Oct 27 '17 at 07:33
@FelixPalmen - windows and console use *utf-16* as native. when we output in *utf-16* (*WriteConsoleW*) - does not matter which codepage is used - text processed without any translation. codepage used only when we use not *W* api for translate multi-byte to utf-16 — RbMm, Oct 27 '17 at 08:13
@RbMm I don't think this is true. Windows uses UTF-16, correct (started out as UCS-2). But the console is a different beast, probably for downwards compatibility. On my machine, the default codepage is `850` (which is *almost* the same as ISO-8859-15) — , Oct 27 '17 at 08:15
@RbMm regarding your edit: This might be true when using native win32 console API. Programming in C or C++, you often want to stay portable, so you're using `stdio` / `iostreams`. — , Oct 27 '17 at 08:18
@FelixPalmen - you mistake. what I say 100% true. console of course use utf-16 as native. I know this exactly. and when we use *W* api - "codepage" not plat any role. codepage used when and only when we use *A* api or `WriteFile` fpr output. — RbMm, Oct 27 '17 at 08:18
@RbMm see my second comment. If you're specifically coding for windows, using the console API and UTF-16 might be an option. But the rest of the world uses UTF-8 for Unicode. Windows would probably use it as well if Microsoft didn't decide to introduce Unicode too early. — , Oct 27 '17 at 08:21
@FelixPalmen - this not *might be true when using native win32 console API* but **true**. about *c/c++* runtime api - yes, when programmer use it - very frequently was mistake, but this not related to console, but only for very bad design c runtime here. frequently when we use *w* version api of *c/c++* runtime - it really **translate** utf-16 data to multibyte and use *A* api for output. after this console need translate multi-byte back. but `WriteConsoleW` perfect worked with utf-16 - no any translation — RbMm, Oct 27 '17 at 08:23
@RbMm "*when we use w version api of c/c++ runtime*" there's no such thing. These are windows specific APIs. The design problem is in Windows, not in C or C++ runtime libraries. The question compares the result with *Ubuntu*, a GNU/Linux distribution. So it's safe to assume OP isn't interested in writing platform-specific code. — , Oct 27 '17 at 08:25
@FelixPalmen - for correct use *c/c++* runtime need `_setmode(..., _O_U16TEXT).`. I yes - programming for windows only. and question tagged as *windows*. windows used utf-16 as native. all window text rendered as utf-16. if we will be used another encoding (say *utf-8*) it first will be translated to *utf-16* anyway. almost all `A` api shell over `W` api which translate multi-byte to *utf-16* and call `W` version and than translate output result back to multi-byte. so for windows the best choise use *utf-16* — RbMm, Oct 27 '17 at 08:28
@FelixPalmen - no this is *c/c++* runtime problem (`wprintf`) not a console or windows problem. the work of `wprintf` depend from `_setmode`. when we `WriteConsoleW` - all perfect worked with unicode — RbMm, Oct 27 '17 at 08:31
@FelixPalmen - *c/c++* runtime implementation of course platform depend. and when programmers try use *c/c++* runtime for utf-16 console output it frequently have problems. problem exactly in *c/c++* runtime implementation. if use native windows api - no any problems at all — RbMm, Oct 27 '17 at 08:36
@FelixPalmen, codepage 65001 is still badly broken, even in the Windows 10 console. Non-ASCII characters can't be read as UTF-8 because of a bad design in the host process (conhost.exe), which assumes 1 byte per character when it calls `WideCharToMultiByte`. Prior to Windows 8, the results are much worse for both input and output. The only really working option is to use the wide-character API by setting the low I/O file descriptors to UTF-16 text mode via `_setmode`. It's not good for portability, I know. — Eryk Sun, Oct 29 '17 at 08:58

score 1 · Answer 1 · answered Oct 27 '17 at 07:50

1

You need to select the correct code page you want for the console window.

Something along the the lines of the code below, details on the function can be found here.

// Set code page for font character set
SetConsoleOutputCP(CP_UTF8);

answered Oct 27 '17 at 07:50

S. Whittaker

118
9

Note that prior to Windows 8, setting the output codepage to UTF-8 will misreport the number of bytes written to the console via `_write`, etc. Instead of returning the bytes written like it's supposed to, the underlying `WriteFile` or `WriteConsoleA` returns the number of decoded UTF-16 codes written. This causes buffered streams to retry a partial write, which creates a trailing sequence of gibberish in proportion to the number of non-ASCII characters written, which need 2-4 UTF-8 bytes per character. It's the worst for codes that need 3 UTF-8 bytes. – Eryk Sun Oct 29 '17 at 09:02

unicode output not showing in windows console

1 Answers1