I have done some research on getting UTF-8/16 to work properly in cmd.exe
. I've found these articles:
https://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/ https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/ http://www.siao2.com/2008/03/18/8306597.aspx
and also this SO question: Output unicode strings in Windows console app
The life-saving function is _setmode
which causes cmd.exe
to Just Work™. But what does it actually do? The first article states that
The Visual C++ runtime library can convert automatically between internal UTF-16 and external UTF-8, if you just ask it to do so by calling the _setmode function with the appropriate file descriptor number and mode flag. E.g., mode _O_U8TEXT causes conversion to/from UTF-8.
That's all nice, but the following (to me) sort of contradicts it. Let's take this simple program:
#include <fcntl.h>
#include <io.h>
#include <iostream>
int main(void)
{
_setmode(_fileno(stdout), _O_U16TEXT);
std::wcout << L"привет śążź Ειρήνη";
// yes, wcout; I can use both wprintf and wcout, they both seem to have the same effect
getchar();
return 0;
}
This prints to console properly (provided we select the right font, of course); without the _setmode
call I get garbage. But what is actually being translated here? What does the function really do? Does it convert FROM UTF-16 to whatever codepage the console is using? Windows uses UTF-16 internally, why is a conversion needed in the first place?
Furthermore, if I change the second parameter to _O_U8TEXT
, the program works just as fine as with _O_U16TEXT
, which confuses me further; the UTF-16 representation of и is very different from the UTF-8 one, so how come this still works?
I should mention that I'm using Visual Studio 2015 (MSVC 14.0) and the source file is encoded as UTF-8 with BOM.