When getting input from std::cin
in windows, the input is apparently always in the encoding windows-1252 (the default for the host machine in my case) despite all the configurations made, that apparently only affect to the output. Is there a proper way to capture input in windows in UTF-8 encoding?
For instance, let's check out this program:
#include <iostream>
int main(int argc, char* argv[])
{
std::cin.imbue(locale("es_ES.UTF-8"));
std::cout.imbue(locale("es_ES.UTF-8"));
std::cout << "ñeñeñe> ";
std::string in;
std::getline( std::cin, in );
std::cout << in;
}
I've compiled it using visual studio 2022 in a windows machine with spanish locale. The source code is in UTF-8. When executing the resulting program (windows powershell session, after executing chcp 65001
to set the default encoding to UTF-8), I see the following:
PS C:\> .\test_program.exe
ñeñeñe> ñeñeñe
e e e
The first "ñeñeñe" is correct: it display correctly the "ñ" caracter to the output console. So far, so good. The user input is echoed back to the console correctly: another good point. But! when it turns to send back the encoded string to the ouput, the "ñ" caracter is substituted by an empty space.
When debugging this program, I see that the variable "in" have captured the input in an encoding that it is not utf-8: for the "ñ" it use only one character, whereas in utf-8 that caracter must consume two. The conclusion is that the input is not affect for the chcp
command. Is something I doing wrong?
UPDATE
Somebody have asked me to see what happens when changing to wcout/wcin:
std::wcout << u"ñeñeñe> ";
std::wstring in;
std::getline(std::wcin, in);
std::wcout << in;
Behaviour:
PS C:\> .\test.exe
0,000,7FF,6D1,B76,E30ñeñeñe
e e e
Other try (setting the string as L"ñeñeñe"):
ñeñeñe> ñeñeñe
e e e
Leaving it as is:
std::wcout << "ñeñeñe> ";
Result is:
eee>