4

I'm working on a C++ console program that prints some Unicode text. On Linux it just works, but on Windows it behaves strangely: Unicode characters are displayed correctly only as long as they are not at the beginning of a std::string. If they are, the program just stops.

Here's the reduction:

#include <iostream>
#include <string>

using std::cout;

int main() {
    std::string letters = "àèéìòùäöüß";
    std::string is_nice = "è bello";    // In Italian this means "is nice"

    cout << "Concatenating the strings using '+':\n";
    cout << "Unicode " + letters << "\n";
    cout << "Unicode " + is_nice << "\n";

    cout << "\n";

    cout << "Using 'cout' and 'operator<<' to print the strings:\n";
    cout << "Unicode " << letters << "\n";
    cout << "Unicode " << is_nice << "\n";
}

The source file is encoded as UTF-8. On Linux I compile it (using g++ 5.4.0) with

g++ -std=c++14 -Wall -Wextra Unicode.cpp -o Unicode

and on Windows (using MinGW.org GCC-6.3.0-1) with

g++ -std=c++14 -Wall -Wextra Unicode.cpp -o Unicode.exe

If I compile it and run it from Linux (in this case I'm using Windows Subsystem for Linux, the Ubuntu version that runs on Windows 10), there's no problem, everything works.

If I compile it and run it from Windows (both cmd and PowerShell), it depends. At first the program was printing garbage. Then I followed the instructions given in another answer, that is, I have used the command chcp 65001 to set my code page to Unicode with UTF-8 encoding, and I have changed the font to Lucida Console. Now, when I cout a string that starts with a regular ASCII character (like the first 2), everything works, but if there's a string that starts with a character like à or è (like the last 2), the program stops. For reference, this is the output on Linux:

Concatenating the strings using '+':
Unicode àèéìòùäöüß
Unicode è bello

Using 'cout' and 'operator<<' to print the strings:
Unicode àèéìòùäöüß
Unicode è bello

And this is what I get on Windows:

Concatenating the strings using '+':
Unicode àèéìòùäöüß
Unicode è bello

Using 'cout' and 'operator<<' to print the strings:
Unicode

And it ends here. Apparently, Unicode characters are dealt with properly if they are in the middle of the string, but the program just stops if the Unicode character is at the beginning. As a workaround, I can remove the space after "Unicode" and put it at the beginning of those 2 strings, and it works. But I'm not happy with this.

Why does it matter which position the Unicode character is at? And how can I solve it?

  • use Consolas instead of Lucida console – phuclv Feb 06 '18 at 10:16
  • @LưuVĩnhPhúc It doesn't work, I get the same result. But I doubt it has to do with the font: the first 2 lines do show those Unicode characters, which means that the font can display them. – Fabio says Reinstate Monica Feb 06 '18 at 10:20
  • It's probably fine if you're using only Western scripts. However Consolas has more Unicode glyphs, better character differentiation and better Cleartype support – phuclv Feb 06 '18 at 10:25
  • 3
    Codepage 65001 doesn't work in the console. It's broken in many ways across various versions of Windows. The only way to reliably use Unicode (at least UCS-2) is via the wide-character console functions such as `ReadConsoleW` and `WriteConsoleW`, or C/C++ `wprintf`, `wcout`, etc after `_setmode(_fileno(stdout), _O_U16TEXT)`. – Eryk Sun Feb 06 '18 at 12:22

0 Answers0