Note: I'm asking an implementation defined behavior which is on Microsoft Visual C++ 2008(possibly the same on 2005+). OS: simplified Chinese installation of Win7.
It surprises me when I'm performing non-ASCII I/O w/ printf
. E.g.
// This won't be necessary as it's the system default code page.
//system("chcp 936");
// NULL to show current locale, which is "C"
printf ("%s\n", setlocale(LC_ALL, NULL));
printf ("中\n");
printf ("%s\n", setlocale(LC_ALL, "English"));
printf ("中\n");
Output:
Active code page: 936
C
中
English_United States.1252
?D
The memory footprint in debugger shows that "中"
is encoded in two bytes: 0xD6
, 0xD0
, which is the code point of that character in code page 936, for simplified Chinese. It shouldn't be in the code point range of "C" locale
which, most likely, is 0x0 ~ 0x7F
.
Question:
Why can it still display the character correctly in "C" locale? So I made a guess that locale had no bearing on printf
? But then, I shall ask, why can't it display anymore when changing to "English"
locale, which is also different from 936? Interesting?
Edit:
I redirected the standard output to a file and took some test. It shows that whatever locale is set, the correct character "中"
is saved in the file. It suggests that setlocale()
is connected to the way console displays the character, which contradicts my understanding of how it works: printf
puts the bytes/code points into input buffer of console, which interprets these bytes using its own code page(what chcp
returns).