I have a Unicode Windows application written in C++. I am trying to convert argv[1]
from wchar_t*
to char*
using the standard codec library.
int wmain(int argc, wchar_t **argv) {
using namespace std;
wstring_convert<codecvt_utf8<wchar_t>, wchar_t> converter;
string str = converter.to_bytes(argv[1]);
cout << str << endl;
return 0;
}
It leads to encoding artifacts. I am executing my program with a non-ASCII argument, like so (in PowerShell on CMD):
myprgram.exe "é"
It outputs é
instead of é
. However, if a hardcode the string in my program by replacing argv[1]
with L"é"
, it works. I should precise that my source is encoded using UTF-8.
What is causing the problem?
EDIT: The reason I'm doing the conversion is not to print the argument passed to the program, but to pass it to some function from a third-party library expecting std::string
as argument. Outputting directly argv[1]
through std::wcout
already works. I analyzed the byte content from both strings and here it is:
argv[1]
: e9 00 00 00
L"é"
: c3 00 a9 00 00 00