I'm looking to move from ASCII to UTF-8 everywhere in my Windows Desktop (Win32/MFC) application. This is as opposed to doing the usual move to UTF-16. The idea being, fewer changes will need to be made, and interfacing with external systems that talk in UTF-8 will require less work.
The problem is that the static control and button in the dialog box from the resource file only ever displays the first character of its kanji text. Should resource files work just fine using UTF-8?
Dialog illustrating the problem
UTF-8 strings appear to be read and displayed correctly coming from the String Table in the resource file, but not text directly on dialogs themselves.
I am testing using kanji characters. How the dialog appears in the resource editor
I have:
- Used a font in the dialog box that should support the kanji characters I'm trying to use.
- Ensured the .rc file is in UTF-8 using Notepad++'s conversion function.
- Added the #pragma code_page(65001) at the top of the .rc file as per The Resource Compiler defaults to CP_ACP, even in the face of subtle hints that the file is UTF-8
- Added the manifest for UTF-8 support in "Manifest Tool/Input and Output/Additional Manifest Files" as per Use UTF-8 code pages in Windows apps
- Added /utf-8 to "C/C++/Command Line/Additional Options" as per Set source and execution character sets to UTF-8
Using UTF-8 everywhere means std::string, CStringA and the -A Win32 functions implicitly by using the "Advanced/Character Set" value of "Not Set". Additionally, the resource file is in UTF-8, including dialogs with their text, String Tables etc. If I set it to "Use Unicode Character Set", my understanding is that UTF-16 and -W functions will be the default everywhere - the standard Windows way of supporting Unicode historically.
The pragma appears to work, as the Resource Editor in Visual Studio does not clobber the .rc file into UTF-16LE. Also, the manifest appears to work as the MessageBox() (MessageBoxA) function displays text from the String Table correctly. Without the manifest, the MessageBox() displays question marks.
TCHAR buffer[512];
LoadString(hInst, IDS_TESTKANJI, buffer, 512 - 1);
MessageBox(hWnd, buffer, _T("Caption"), MB_OK);
If I set the Character Encoding to "Use Unicode Character Set", everything appears to work as expected - all characters are displayed. Dialog successfully showing kanji
My suspicion is that the encoding is going UTF-8(.rc file) -> UTF-16(internal representation) -> ASCII (Dialog text loading?), meets a null character from the UTF-16 representation, and stops after reading the first character.
If I call SetDlgItemText() on my static control using text from the String Table, the static control will show all the characters correctly:
case WM_COMMAND:
if (LOWORD(wParam) == IDOK)
{
TCHAR buffer[512];
LoadString(hInst, IDS_TESTKANJI, buffer, 512 - 1);
SetDlgItemText(hDlg, IDC_STATIC, buffer);
...
- Windows OS Build: 19044.2130
- Visual Studio 2022 17.4.2
- Windows SDK Version: 10.0.22621.0
- Platform Toolset: Visual Studio 2022 (v143)