the windows is unicode (UTF-16) system. console unicode as well. if you want print unicode text - you need (and this is most effective) use WriteConsoleW
BOOL PrintString(PCWSTR psz)
{
DWORD n;
return WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), psz, (ULONG)wcslen(psz), &n, 0);
}
PrintString(L"—");
in this case in your binary file will be wide character —
(2 bytes 0x2014
) and console print it as is.
if ansi (multi-byte) function is called for output console - like WriteConsoleA
or WriteFile
- console first translate multi-byte string to unicode via MultiByteToWideChar
and in place CodePage will be used value returned by GetConsoleOutputCP
. and here (translation) can be problem if you use characters > 0x80
first of all compiler can give you warning: The file contains a character that cannot be represented in the current code page (number). Save the file in Unicode format to prevent data loss. (C4819). but even after you save source file in Unicode format, can be next:
wprintf(L"ù"); // no warning
printf("ù"); //warning C4566
because L"ù"
saved as wide char string (as is) in binary file - here all ok and no any problems and warning. but "ù"
is saved as char string (single byte string). compiler need convert wide string "ù" from source file to multi-byte string in binary (.obj file, from which linker create pe than). and compiler use for this WideCharToMultiByte
with CP_ACP (The current system default Windows ANSI code page.)
so what happens if you say call printf("ù");
?
- unicode string "ù" will be converted to multi-byte
WideCharToMultiByte(CP_ACP, )
and this will be at compile time. resulting multi-byte string will be saved in binary file
- the console it run-time convert your multi-byte string to
wide char by
MultiByteToWideChar(GetConsoleOutputCP(), ..)
and
print this string
so you got 2 conversions: unicode -> CP_ACP -> multi-byte -> GetConsoleOutputCP() -> unicode
by default GetConsoleOutputCP() == CP_OEMCP != CP_ACP
even if you run program on computer where you compile it. (on another computer with another CP_OEMCP
especially)
problem in incompatible conversions - different code pages used. but even if you change console code page to your CP_ACP
- convertion anyway can wrong translate some characters.
and about CRT api wprintf
- here situation is next:
the wprintf
first convert given string from unicode to multi-byte by using it internal current locale (and note that crt locale independent and different from console locale). and then call WriteFile
with multi-byte string. console convert back this multi-bytes string to unicode
unicode -> current_crt_locale -> multi-byte -> GetConsoleOutputCP() -> unicode
so for use wprintf
we need first set current crt locale to GetConsoleOutputCP()
char sz[16];
sprintf(sz, ".%u", GetConsoleOutputCP());
setlocale(LC_ALL, sz);
wprintf(L"—");
but anyway here i view (on my comp) -
on screen instead —
. so will be -—
if call PrintString(L"—");
(which used WriteConsoleW
) just after this.
so only reliable way print any unicode characters (supported by windows) - use WriteConsoleW api.