Converting argv[1] from wchar_t to char leads to encoding artifacts (Windows)

Question

I have a Unicode Windows application written in C++. I am trying to convert argv[1] from wchar_t* to char* using the standard codec library.

int wmain(int argc, wchar_t **argv) {
    using namespace std;

    wstring_convert<codecvt_utf8<wchar_t>, wchar_t> converter;
    string str = converter.to_bytes(argv[1]);

    cout << str << endl;

    return 0;
}

It leads to encoding artifacts. I am executing my program with a non-ASCII argument, like so (in PowerShell on CMD):

myprgram.exe "é"

It outputs Ã© instead of é. However, if a hardcode the string in my program by replacing argv[1] with L"é", it works. I should precise that my source is encoded using UTF-8.

What is causing the problem?

EDIT: The reason I'm doing the conversion is not to print the argument passed to the program, but to pass it to some function from a third-party library expecting std::string as argument. Outputting directly argv[1] through std::wcout already works. I analyzed the byte content from both strings and here it is:

argv[1]: e9 00 00 00

L"é": c3 00 a9 00 00 00

It generally presumes that `std::string` is an ASCII string. In C++20 there is `std::u8string` that is a UTF-8 string. Though, there should be a way to print `std::string` as a UTF-8 string too. — ALX23z, Nov 21 '22 at 17:42
I think the issue is with printing utf-8 characters. See https://stackoverflow.com/questions/45575863/how-to-print-utf-8-strings-to-stdcout-on-windows — ALX23z, Nov 21 '22 at 17:54
Does the `cout` stream (in your system and your implementation) accept UTF-8 strings? Don't know if adding a BOM header would help either. Maybe the conversion is correct but the output not. `argv[1]` is already a wide string, you could output that instead. You can also use the debugger to examine the byte content of `str` for both cases (argv[1] and L"é" literal). Also, your question has nothing to do with MFC, I think you should remove the MFC tag. — Constantine Georgiou, Nov 21 '22 at 18:10

Converting argv[1] from wchar_t** to char** leads to encoding artifacts (Windows)

0 Answers0

Converting argv[1] from wchar_t to char leads to encoding artifacts (Windows)