I'm working on a C++ console program that prints some Unicode text. On Linux it just works, but on Windows it behaves strangely: Unicode characters are displayed correctly only as long as they are not at the beginning of a std::string
. If they are, the program just stops.
Here's the reduction:
#include <iostream>
#include <string>
using std::cout;
int main() {
std::string letters = "àèéìòùäöüß";
std::string is_nice = "è bello"; // In Italian this means "is nice"
cout << "Concatenating the strings using '+':\n";
cout << "Unicode " + letters << "\n";
cout << "Unicode " + is_nice << "\n";
cout << "\n";
cout << "Using 'cout' and 'operator<<' to print the strings:\n";
cout << "Unicode " << letters << "\n";
cout << "Unicode " << is_nice << "\n";
}
The source file is encoded as UTF-8. On Linux I compile it (using g++ 5.4.0) with
g++ -std=c++14 -Wall -Wextra Unicode.cpp -o Unicode
and on Windows (using MinGW.org GCC-6.3.0-1) with
g++ -std=c++14 -Wall -Wextra Unicode.cpp -o Unicode.exe
If I compile it and run it from Linux (in this case I'm using Windows Subsystem for Linux, the Ubuntu version that runs on Windows 10), there's no problem, everything works.
If I compile it and run it from Windows (both cmd and PowerShell), it depends. At first the program was printing garbage. Then I followed the instructions given in another answer, that is, I have used the command chcp 65001
to set my code page to Unicode with UTF-8 encoding, and I have changed the font to Lucida Console. Now, when I cout
a string that starts with a regular ASCII character (like the first 2), everything works, but if there's a string that starts with a character like à
or è
(like the last 2), the program stops. For reference, this is the output on Linux:
Concatenating the strings using '+':
Unicode àèéìòùäöüß
Unicode è belloUsing 'cout' and 'operator<<' to print the strings:
Unicode àèéìòùäöüß
Unicode è bello
And this is what I get on Windows:
Concatenating the strings using '+':
Unicode àèéìòùäöüß
Unicode è belloUsing 'cout' and 'operator<<' to print the strings:
Unicode
And it ends here. Apparently, Unicode characters are dealt with properly if they are in the middle of the string, but the program just stops if the Unicode character is at the beginning. As a workaround, I can remove the space after "Unicode" and put it at the beginning of those 2 strings, and it works. But I'm not happy with this.
Why does it matter which position the Unicode character is at? And how can I solve it?