I'm working on a cross-platform project using Qt. On Windows, I want to pass some Unicode characters (for instance, file path that contains Chinese characters) as arguments when launching the application from the command line. Then use these arguments to create a QCoreApplication
.
For some reasons, I need to use CommandLineToArgvW
to get the argument list like this:
LPWSTR * argvW = CommandLineToArgvW( GetCommandLineW(), &argc );
I understand on modern Windows OS, LPWSTR
is actually wchar_t*
which is 16bit and uses UTF-16 encoding.
While if I want to initialize the QCoreApplication
, it only takes char*
but not wchar_t*
. QCoreApplication
So the question is: how can I safely convert the LPWSTR
returned by CommandLineToArgvW()
function to char*
without losing the UNICODE encoding (i.e. the Chinese characters are still Chinese characters for example)?
I've tried many different ways without success:
1:
std::string const argvString = boost::locale::conv::utf_to_utf<char>( argvW[0] )
2:
int res;
char buf[0x400];
char* pbuf = buf;
boost::shared_ptr<char[]> shared_pbuf;
res = WideCharToMultiByte(CP_UTF8, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);
3: Convert to QString first, then convert to UTF-8.
ETID:
Problem solved. The UTF-16 wide character to UTF-8 char
conversion actually works fine without problem with all these three approaches. And in Visual Studio, in order to correctly view the UTF-8 string in debug, it's necessary to append the s8
format specifier after the watched variable name (see: https://msdn.microsoft.com/en-us/library/75w45ekt.aspx). This is the part that I overlooked and made me think that my string conversion was wrong.
The real issue here is actually when calling QCoreApplication.arguments()
, the returned QString
is constructed by QString::fromLocal8Bit()
, which would cause encoding issues on Windows when the command line arguments contain unicode characters. The workaround is whenever necessary to retrieve the command line arguments on Windows, always call the Windows API CommandLineToArgvW()
, and convert the 16-bit UTF-16 wchar_t * (or LPWSTR) to 8-bit UTF-8 char * (by one of the three ways mentioned above).