3

I have done some research on getting UTF-8/16 to work properly in cmd.exe. I've found these articles:

https://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/ https://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/ http://www.siao2.com/2008/03/18/8306597.aspx

and also this SO question: Output unicode strings in Windows console app

The life-saving function is _setmode which causes cmd.exe to Just Work™. But what does it actually do? The first article states that

The Visual C++ runtime library can convert automatically between internal UTF-16 and external UTF-8, if you just ask it to do so by calling the _setmode function with the appropriate file descriptor number and mode flag. E.g., mode _O_U8TEXT causes conversion to/from UTF-8.

That's all nice, but the following (to me) sort of contradicts it. Let's take this simple program:

#include <fcntl.h>
#include <io.h>
#include <iostream>

int main(void) 
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout << L"привет śążź Ειρήνη"; 
    // yes, wcout; I can use both wprintf and wcout, they both seem to have the same effect

    getchar();  
    return 0;
}

This prints to console properly (provided we select the right font, of course); without the _setmode call I get garbage. But what is actually being translated here? What does the function really do? Does it convert FROM UTF-16 to whatever codepage the console is using? Windows uses UTF-16 internally, why is a conversion needed in the first place?

Furthermore, if I change the second parameter to _O_U8TEXT, the program works just as fine as with _O_U16TEXT, which confuses me further; the UTF-16 representation of и is very different from the UTF-8 one, so how come this still works?

I should mention that I'm using Visual Studio 2015 (MSVC 14.0) and the source file is encoded as UTF-8 with BOM.

Community
  • 1
  • 1
user4520
  • 3,401
  • 1
  • 27
  • 50
  • 1
    You give it the go-ahead to completely ignore the need to support I/O redirection. So no longer any need to convert to an 8-bit codepage encoding and trying to stay compatible with whatever program you redirect to/from, it can call the native console api functions directly. – Hans Passant Nov 07 '15 at 22:39
  • So how do you "select the right font" actually? –  Jun 30 '18 at 15:04

0 Answers0