Well the entire standard IO library is dodgy with that code page. Here's another test program (\xe2\x86\x92
is the arrow →
in UTF-8):
#include <stdio.h>
int main(void)
{
char s[] = "\xe2\x86\x92 a \xe2\x86\x92 b\n";
int l = (int) sizeof(s) - 1;
int wr = fwrite(s, 1, l, stdout);
printf("%d/%d written\n", wr, l);
return 0;
}
And its output:
��� a → b
10/12 written
Note that the first character is again replaced by the ���
(it's 3 bytes in UTF-8), and the fwrite
call returns the number of characters written on the console. This is a violation of the C standard (it should return the number of bytes), and it will break every program using fwrite or related functions correctly (for instance, try to print "☺☺☺☺☺☺☺☺☺☺☺☺"
with Python 3.4).
So your only options to reliably output Unicode text are Windows-specific (unless these issues are fixed in the latest version of MSVC):