MessageBox does not print UNICODE characters

Question

I'm using the following to print a message in a Win32 API MessageBox:

MessageBox(hWnd, TEXT("Já existe um controlador em execução"), TEXT("Erro"), 0);

MessageBox is a macro and is expanding to MessageBoxW. The trouble is that it doesn't print Unicode, whereas the window that calls it prints Unicode without any issue, it seems that this is a problem with MessageBox itself.

Does anyone know how to solve this?

FYI, I also tried:

MessageBoxEx(hWnd, TEXT("Já existe um controlador em execução"), TEXT("Erro"), 0, MAKELANGID(LANG_PORTUGUESE, SUBLANG_PORTUGUESE));

But it's the same, as expected.

Here is a picture of the call with the expansion:

And it prints:

Note that the main window menu has unicode characters that are printed correctly.

@anastaciu Richard was just making sure you are really compiling for Unicode and not for ANSI. If `L"..."` failed to compile, that would mean `MessageBox` resolved to `MessageBoxA` instead, despite your claim. In any case, you say the dialog is not printing the Unicode, so what IS it printing exactly? Can you provide a screenshot? — Remy Lebeau, Jun 03 '21 at 16:06
I copy/pasted the first code snippet into my toy win32 project and it worked fine. — Retired Ninja, Jun 03 '21 at 16:07
@anastaciu my doubt is that the code as pasted should work. Therefore there is/are some unknown factor(s) in play. Just trying to eliminate some of the obvious ones. — Richard Critten, Jun 03 '21 at 16:09
When you do `File > Save As...` and click on the little down-arrow next to the `Save` button, what file encoding do you see? — Eljay, Jun 03 '21 at 16:15
It's an encoding clash, on MS-Windows `L" .... "` expects UTF-16. Try saving the file as UTF-16 and checking both the contents of the file after the save and what happens when the program is run. _"...The type of a L"..." string literal is const wchar_t[N], where N is the size of the string in code units of the execution wide encoding, including the null terminator...."_ https://en.cppreference.com/w/cpp/language/string_literal — Richard Critten, Jun 03 '21 at 16:21
@Scheff'sCat saved it as `UTF-16 LE` and it's working fine now, well spotted, I'll be happy to accept your answer, this may be relevant as I didn't find similar posts anywhere. — anastaciu, Jun 03 '21 at 16:26
After having read all the conversation before I even was not sure to add something useful... I'm used to the fact that Windows supports UTF-16 (the ANSI stuff I don't consider as alternative) while I'm a fan of UTF-8 everywhere. Deeply mistrusting the Windows API trickery under the hood, I always use explicitly `W`-suffixed Windows functions (in case) and provide my UTF-8 texts explicitly converted to UTF-16 (with a resp. function and on-the-fly). That might appear over-engineered or old-fashioned but issues like your are exactly what I tried to prevent. ;-) — Scheff's Cat, Jun 03 '21 at 16:29
@RichardCritten thanks, it worked, I missed your comment at first, I don't know if it was before Scheff's, anyway, you're spot on. I still don't understand why it works on the menu but not on the box. — anastaciu, Jun 03 '21 at 16:38
@anastaciu how are you defining your menus? In code, or in a resource? Resources use UTF-16 strings. — Remy Lebeau, Jun 03 '21 at 16:46
@RemyLebeau, I was wrong, the cpp file where I append the menus is in ANSI, which is even more puzzling. — anastaciu, Jun 03 '21 at 16:51
@anastaciu not really, if the menu strings are in ANSI and the `.cpp` file is being saved using the same (or compatible) ANSI encoding as the user who is running your program. Then everything will likely match up. But that is not a guarantee, which is why ANSI should be avoided. — Remy Lebeau, Jun 03 '21 at 16:55
@RemyLebeau, I see, I'll be sure to save them all as UTF-16. — anastaciu, Jun 03 '21 at 16:56
@anastaciu if possible move all literal (UI) strings into the resource file and check it's encoding is UTF-16. Then you only have one place to check encodings. — Richard Critten, Jun 03 '21 at 16:57
Don't trust IntelliSense when it comes to macros. Either use your debugger to verify, or don't have anyone guess by naming the functions you call and types you use. Generic-text mappings serve a single purpose: Obfuscating code. — IInspectable, Jun 03 '21 at 17:27
@sch I understand that the UTF-8 manifesto is very popular. And very wrongly footed. It fails to acknowledge, that **everyone** is using UTF-16. Windows does, NTFS does (sort of), .NET does, Java does. That's a **lot** of UTF-16. UTF-8 is great, for *data exchange*. Using UTF-8 everywhere, like the manifesto suggests, isn't going to be useful. It's dogmatic, really, and doesn't provide rationale for why I shouldn't regard it as dogmatic either. — IInspectable, Jun 04 '21 at 08:21
@IInspectable I admit that dogmas have a smell. I consider it as as rule of thumb. I'm working on an application which became huge over the years. To keep things simple, I prefer to assume that any string (with text) provides UTF-8 or plain ASCII. Parser tools, I have collected over the years, are working on `char` which might be ASCII or UTF-8. (In the latter case, everything beyond 7 bit ASCII is just kept as is so that UTF-8 sequences don't break.) I might have decided to use UTF-16 instead. Whatever I had chosen - somewhere I have to pay for it. So, making a choice at all was important. — Scheff's Cat, Jun 04 '21 at 08:52
@sch If you have to make choice, `wchar_t` is the best option on Windows. It's not just Windows' native encoding, it's also unambiguous. The major point being: "UTF-8 Everywhere" is no better a choice than any other dogmatic (as opposed to judicious). — IInspectable, Jun 04 '21 at 11:03

score 4 · Accepted Answer · answered Jun 03 '21 at 16:59

4

To avoid source-code encoding based problems in the future, you can use \uxxxx style escape characters for non-ascii characters:

MessageBoxW(nullptr, L"J\u00E1 existe um controlador em execu\u00E7\u00E1o", L"Erro", MB_OK);

answered Jun 03 '21 at 16:59

Aykhan Hagverdili

28,141
6
41
93

1

Hum, that can be a good solution, thoug it's kind of a hassle, you think that UTF-16 won't work in all cases? – anastaciu Jun 03 '21 at 17:05
@anastaciu it will work if you save your files as UTF 16. This is kind of a safe-alternative. You could write a python script to do this to your source code fwiw. – Aykhan Hagverdili Jun 03 '21 at 17:07

score 0 · Answer 2 · answered Jun 03 '21 at 20:19

If you do not want to encode your Unicode characters as escape sequences, make sure your source code editor uses the same encoding as your compiler.

What you experience is that you see Unicode characters, but the wrong ones. This is called "Mojibake" and happens whenever you (or your compiler!) interprets a file using a different encoding (likely some iso-8859-? encoding) than your editor (utf-8).

You could configure your compiler to use utf-8 as well. If you are using gcc, you might want to read the accepted answer in this SO post.

MessageBox does not print UNICODE characters

2 Answers2