In today(2023.01)'s MSDN https://learn.microsoft.com/en-us/windows/win32/inputdev/wm-char , Microsoft says that:
... Otherwise(using ANSI version of RegisterClass), the system provides characters in the current process code page, which can be set to UTF-8 in Windows Version 1903 (May 2019 Update) and newer.
But, I just can NOT see WM_CHAR presenting Unicode character in UTF8 sequence. Am I doing wrong, or the document is wrong/misleading?
I do the experiments on Win10.21H2, using Keyview2A.exe v1.8, which is based on Charles Petzold's Keyview2 demo program in his famous book Programming Windows 5th-ed (1998).
I'm trying on Win10.21H2 .
First, the non-UTF8ACP case to show that KeyviewA works OK.
I try to type a Chinese character 电, which is U+7535, and GBK encoding B5 E7
.
Second, the UTF8ACP case does NOT give KeyviewA UTF8 sequence.
I just got 0x3F(?), sigh!
Third, what about those characters from SBCS?
SBCS = Single-byte character set. DBCS = Double-byte character set. MBCS = Multi-byte character set. (generic name for SBCS, DBCS and 3+byte character set)
Most European countries use such character set.
Type in some Russian letters:
Type in some Greek letters:
[20230121.c1] So far, I seem to have found out the rule about "enabling UTF8ACP", for an ANSI(narrow-char) program. Summarized below:
The IME produces Unicode value for any human-input character. When Windows need to send that character to KeyviewA, it does the following:
- Check the HKL value for the target HWND. Memo: KeyviewA itself can query this HKL value by
GetKeyboardLayout(0)
. - Get the ANSI-codepage associated the HKL value(lets call it
curhkl
). This can be acquired bycurcodepage=GetLocaleInfo(LOWORD(curhkl), LOCALE_IDEFAULTANSICODEPAGE, ...);
. - Call
WideCharToMultiByte(curcodepage, ...)
to convert the Unicode value to MBCS sequence.- If the MBCS is a single byte(e.g. 0xE1), Windows sends one WM_CHAR message to Keyview2A with wParam=0xE1 .
- If the MBCS is two bytes(e.g. 0xB5 0xE7), then Windows sends two WM_CHAR messages to Keyview2A with wParam=0x3F, both.