3

I'm working on a cross-platform project using Qt. On Windows, I want to pass some Unicode characters (for instance, file path that contains Chinese characters) as arguments when launching the application from the command line. Then use these arguments to create a QCoreApplication.

For some reasons, I need to use CommandLineToArgvW to get the argument list like this:

LPWSTR * argvW = CommandLineToArgvW( GetCommandLineW(), &argc );

I understand on modern Windows OS, LPWSTR is actually wchar_t* which is 16bit and uses UTF-16 encoding.

While if I want to initialize the QCoreApplication, it only takes char* but not wchar_t*. QCoreApplication

So the question is: how can I safely convert the LPWSTR returned by CommandLineToArgvW() function to char* without losing the UNICODE encoding (i.e. the Chinese characters are still Chinese characters for example)?

I've tried many different ways without success:

1:

    std::string const argvString = boost::locale::conv::utf_to_utf<char>( argvW[0] )

2:

    int res;
    char buf[0x400];
    char* pbuf = buf;
    boost::shared_ptr<char[]> shared_pbuf;

    res = WideCharToMultiByte(CP_UTF8, 0, pcs, -1, buf, sizeof(buf), NULL, NULL);

3: Convert to QString first, then convert to UTF-8.

ETID: Problem solved. The UTF-16 wide character to UTF-8 char conversion actually works fine without problem with all these three approaches. And in Visual Studio, in order to correctly view the UTF-8 string in debug, it's necessary to append the s8 format specifier after the watched variable name (see: https://msdn.microsoft.com/en-us/library/75w45ekt.aspx). This is the part that I overlooked and made me think that my string conversion was wrong.

The real issue here is actually when calling QCoreApplication.arguments(), the returned QString is constructed by QString::fromLocal8Bit(), which would cause encoding issues on Windows when the command line arguments contain unicode characters. The workaround is whenever necessary to retrieve the command line arguments on Windows, always call the Windows API CommandLineToArgvW(), and convert the 16-bit UTF-16 wchar_t * (or LPWSTR) to 8-bit UTF-8 char * (by one of the three ways mentioned above).

Wayee
  • 379
  • 1
  • 5
  • 17
  • How do you later determine if your call to `QCoreApplication` is successful? That is, you say that you want "the Chinese characters are still Chinese characters". So how do you tell that they no longer are. Show us the code that, given an appropriate conversion function, you would expect to work. – Nicol Bolas Jul 11 '16 at 21:35
  • 1
    According to the documentation Qt will automatically use `CommandLineToArgvW` for you, *unless* you pass modified arguments to the `QCoreApplication` constructor. It does not state what exactly "modified" means, but presumably the intent is to just work for ordinary code that just blindly forwards the `main` arguments, but honor the client code's wish if there is any difference. See http://doc.qt.io/qt-5/qcoreapplication.html#arguments – Cheers and hth. - Alf Jul 11 '16 at 21:38
  • Possible duplicate of [Windows unicode commandline argv](http://stackoverflow.com/questions/4101864/windows-unicode-commandline-argv) – Dan Korn Jul 11 '16 at 21:46
  • 3
    `WideCharToMultiByte(CP_UTF8, ...` *is* the canonical way under Windows. You say it "*fails*". What's the return value, and what's the `GetLastError()` after that? – dxiv Jul 11 '16 at 21:51
  • See also http://stackoverflow.com/questions/148403/utf8-to-from-wide-char-conversion-in-stl. I'm not voting to close this because Qt might make a difference in the answer. – Mark Ransom Jul 12 '16 at 01:18
  • @NicolBolas, I can assure the characters are still the ones that I want by loading files with the passed arguments as the file path. If the files are loaded correctly then the strings are converted and passed with success. – Wayee Jul 12 '16 at 07:27
  • *Sigh*. Good catch about the qtmain_win.cpp not using `CommandLineToArgvW` for Win32. After a more thorough re-read, it looks like Qt converts the arguments to the local 8-bit character set for `QCoreApplication`, no matter what. You can pass modified arguments to the constructor, but they have to be in the system local encoding. I'm slightly flabbergasted why this is, although I suspect it's for legacy reasons. `QGuiApplication` may modify this behavior and read Unicode strings, but navigating the morass of QPA is proving impregnable ATM. I've deleted my answer, since it's not helpful. – jonspaceharper Jul 12 '16 at 15:37

2 Answers2

2

You should be able to use QString's functions. For example

QString str = QString::fromUtf16((const ushort*)argvW[0]);
::MessageBoxW(0, (const wchar_t*)str.utf16(), 0, 0);

When using WideCharToMultiByte, pass zero for output buffer and output buffer's length. This will tell you how many characters you need for output buffer. For example:

const wchar_t* wbuf = argvW[0];
int len = WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, 0, 0, 0, 0);

std::string buf(len, 0);

WideCharToMultiByte(CP_UTF8, 0, wbuf, -1, &buf[0], len,0,0);
QString utf8;
utf8 = QString::fromUtf8(buf.c_str());
::MessageBoxW(0, (const wchar_t*)utf8.utf16(), 0, 0);

The same information should be available in QCoreApplication::arguments. For example, run this code with Unicode argument and see the output:

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);
    QString filename = QString::fromUtf8("ελληνική.txt");
    QFile fout(filename);
    if (fout.open(QIODevice::WriteOnly | QIODevice::Text))
    {
        QTextStream oss(&fout);
        oss.setCodec("UTF-8");
        oss << filename << "\n";
        QStringList list = a.arguments();
        for (int i = 0; i < list.count(); i++)
            oss << list[i] << "\n";
    }
    fout.close();
    return a.exec();
}

Note that in above example the filename is internally converted to UTF-16, that's done by Qt. WinAPI uses UTF-16, not UTF-8

Barmak Shemirani
  • 30,904
  • 6
  • 40
  • 77
  • `QCoreApplication`'s constructor does not accept `QString` for command line arguments. – jonspaceharper Jul 12 '16 at 01:42
  • I suggested using `QString` for UTF-16/UTF-8 conversion. `CommandLineToArgvW` is WinAPI, it will have the right content and it can be converted and passed as UTF-8 `char`. – Barmak Shemirani Jul 12 '16 at 02:53
  • I should have been clearer: `QCoreApplication` takes ANSI character strings only. See my answer for an explanation. – jonspaceharper Jul 12 '16 at 03:39
  • Part of the code in `"qtmain_win.cpp"` is for Windows CE. The other part converts the argument to ANSI and passes it to `main`. But the OP is overriding that in `main` and fixing it. UTF-8 is always stored in `char` and it can be passed as `char` (it's variable length is not an issue in this case). That's all fine. By the way, as far as I can tell `QCoreApplication::arguments()` is Unicode compatible and doesn't need fixing. Console programs usually have trouble with Chinese input in first place, OP needs to clarify... – Barmak Shemirani Jul 12 '16 at 04:31
  • It's the constructor that does not take Unicode, because ANSI strings are passed to `main()`. One missing word (constructor) makes a lot of difference in my comment. :-/ – jonspaceharper Jul 12 '16 at 11:30
  • @JonHarper Then how should I properly initialize `QCoreApplication` to take the unicode arguments? I've tried the solution mentioned in one of the comments ([link](http://stackoverflow.com/questions/4101864/windows-unicode-commandline-argv)), by giving NULL to construct `QApplication`. But the arguments were not taken into account at all. – Wayee Jul 12 '16 at 11:38
  • @BarmakShemirani After the UTF-16 to UTF-8 conversion in your example (`utf8 = QString::fromUtf8(buf.c_str());`), `buf`can not display the correct character anymore. – Wayee Jul 12 '16 at 11:44
  • Conversion to UTF-8 is fairly simply, I don't know what you are having problems with. My guess is that you are headed in the wrong direction, trying to put UTF-8 where it doesn't belong. See updated answer for simple example of writing to file in Unicode. – Barmak Shemirani Jul 13 '16 at 00:08
  • @BarmakShemirani How about in the end of your second paragraph of code, add one line: `::MessageBox(0, utf8.toUtf8().data(), 0, 0);` which simply shows the QString with UTF-8 encoding? For a concrete example: let's initialize `wbuf` as one single Chinese character: `const wchar_t* wbuf = L"我"`, the messagebox we added here will show `我`(three bytes) instead of the character `我`. Same result would be seen in the `std::string` `buf` if you inspect the variable in debug. I don't understand here. – Wayee Jul 13 '16 at 11:58
  • 1
    So you have been looking in the wrong direction the whole time. Again, **Windows APIs do not understand UTF-8**. Windows APIs use UTF-16. Wide string functions such as `CommandLineToArgvW` return UTF-16 string. You can display that string with `MessageBoxW(0, argvW[0], 0, 0)`, you should not convert it to UTF-8. Qt's `QString` tries to solve this incompatibility between Windows and many Unix based systems which use UTF-8. In Windows programming you usually convert to UTF-8 only when importing/exporting data, for example from text file, or HTML input file. – Barmak Shemirani Jul 13 '16 at 16:06
2

Qt internally wraps int main(), extracting and parsing the Unicode command line arguments (via CommandLineToArgvW) before any of your code is executed. The resulting parsed data is converted to the local UTF-8 format as char **argv via the equivalent of QString::toLocal8Bit().

Use QCoreApplication::arguments() to retrieve the Unicode args. Also, a helpful note from the docs:

On Windows, the list is built from the argc and argv parameters only if modified argv/argc parameters are passed to the constructor. In that case, encoding problems might occur.

jonspaceharper
  • 4,207
  • 2
  • 22
  • 42
  • So are you suggesting that whenever unicode is involved, one should never use the Windows API `CommandLineToArgvW()` to retrieve the `argv` arguments and pass them to QCoreApplication? Then what would be the correct way then? – Wayee Jul 12 '16 at 07:37
  • @Wayee See my update. Just call `QCoreApplication::arguments()` to retrieve the data you need. – jonspaceharper Jul 12 '16 at 11:37
  • The QStringList returned by calling `QCoreApplication::arguments()` is actually the ones you passed to `QApplication` when you constructed it. My problem is to pass correct unicode string to construct `QApplication`. To be clear, on MAC OS, calling `QApplication(0,nullptr)` will be enough to pick up the console arguments. But on Windows, doing so would simply send empty argument list to construct `QApplication`. – Wayee Jul 12 '16 at 12:39
  • Maybe I wasn't clear enough before. What I'm looking for is the decent way to retrieve unicode arguments, and to construct the `QApplication` with these arguments, on Windows. A concrete example is to support the behavior of loading a file by dragging the file over my application and release the mouse. The file path is then passed through command line as argument to launch my application. It becomes problematic once the file path contains non ANSI characters. @Jon – Wayee Jul 12 '16 at 13:25
  • Since the `arguments()` function turned out to be what you needed, I've undefeated this answer for the info about Qt wrapping `main()` – jonspaceharper Jul 14 '16 at 10:17