The Windows console has been Unicode aware for at least a decade and perhaps as far back as Windows NT. However for some reason the major cross-platform scripting languages including Perl and Python only ever output various 8-bit encodings, requiring much trouble to work around. Perl gives a "wide character in print" warning, Python gives a charmap error and quits. Why on earth after all these years do they not just simply call the Win32 -W APIs that output UTF-16 Unicode instead of forcing everything through the ANSI/codepage bottleneck?
Is it just that cross-platform performance is low priority? Is it that the languages use UTF-8 internally and find it too much bother to output UTF-16? Or are the -W APIs inherently broken to such a degree that they can't be used as-is?
UPDATE
It seems that the blame may need to be shared by all parties. I imagined that the scripting languages could just call wprintf
on Windows and let the OS/runtime worry about things such as redirection. But it turns out that even wprintf on Windows converts wide characters to ANSI and back before printing to the console!
Please let me know if this has been fixed since the bug report link seems broken but my Visual C test code still fails for wprintf and succeeds for WriteConsoleW.
UPDATE 2
Actually you can print UTF-16 to the console from C using wprintf
but only if you first do _setmode(_fileno(stdout), _O_U16TEXT)
.
From C you can print UTF-8 to a console whose codepage is set to codepage 65001, however Perl, Python, PHP and Ruby all have bugs which prevent this. Perl and PHP corrupt the output by adding additional blank lines following lines which contain at least one wide character. Ruby has slightly different corrupt output. Python crashes.
UPDATE 3
Node.js is the first scripting language that shipped without this problem straight out of the box.
The Python dev team slowly came to realize this was a real problem since it was first reported back at the end of 2007 and has seen a huge flurry of activity to fully understand and fully fix the bug in 2016.