Problem is that there are a lot of places where this could be broken.
Here is answer I've posted some time ago (covers Korean). Answear is for MSVC, but same applies to MinGW (compiler switches are different, locale name may be different).
Here are 5 traps which makes this hard:
- Source code encoding. Source has to use encoding which supports all required characters. Nowadays UTF-8 is recommended. It is best to make sure your editor (IDE) is properly configure to enforce source encoding.
- You have to inform compiler what is encoding of source file. For gcc it is:
-finput-charset=utf-8
(it is default)
- Encoding used by executable. You have to define what kind of encoding string literals should be encode in final executable. This encoding should also cover required characters. Here
UTF-8
is also the best. Gcc option is -fexec-charset=utf-8
- When you run application you have to inform standard library what kind of encoding your string literals are define in or what encoding in program logic is used. So somewhere in your code at beginning of execution you need something like this (here UTF-8 is enforced):
std::locale::global(std::locale{".utf-8"});
- and finally you have to instruct stream what kind of encoding it should use. So for
std::cout
and std::cin
you should set locale which is default for the system:
auto streamLocale = std::locale{""};
// this impacts date/time/floating point formats, so you may want tweak it just to use sepecyfic encoding and use C-loclae for formating
std::cout.imbue(streamLocale);
std::cin.imbue(streamLocale);
After this everything should work as desired without code which explicitly does conversions.
Since there are 5 places to make mistake, this is reason people have trouble with it and internet is full of "hack" solutions.
Note that if system is not configured for support all needed characters (for example wrong code page is set) then with thsi configuration characters which could not be converted will be replaced with question mark.