0

My program need to display some unicode strings. If I save the source code using UTF-16LE, everything works fine. However, I was not be able to cross compile in linux using mingw. If I save the source using UTF-8, the source code can be compiled with no problem. But all the const unicode strings were not displayed correctly since they are encoded as UTF-8. How can I properly display unicode string when the source code is saved as UTF-8 encoded?

Example Code:

#include <Windows.h>

int main(int argc, const char *argv[])
{
    MessageBoxW(NULL, L"你好", L"你好", MB_OK);
}

Compiled with UTF-16LE source file

Compiled with UTF-8 source file

  • Please try to create a [mcve] to show us, including the strings themselves and how you use them. – Some programmer dude Dec 26 '19 at 09:03
  • I imagine that `gcc` expects the file's encoding to match the terminal's encoding. In Windows, you can change the encoding of the terminal (console) using `chcp`. Specifically, `chcp 65001` is UTF-8. – ikegami Dec 26 '19 at 09:41
  • Even if you use wide-character strings in the source, the editor might save the contents of the literals in UTF-8 or any other encoding possible. You might want to check your editor settings. – Some programmer dude Dec 26 '19 at 09:53
  • Use `u8"xxxxx"` for a UTF-8 encoded string literal. – Jonathan Potter Dec 26 '19 at 11:46
  • @Someprogrammerdude Yes, the problem exists when I encode the source file to UTF-8. –  Dec 26 '19 at 13:16
  • Related, possibly even a duplicate of [What are the different character sets used for?](https://stackoverflow.com/q/27872517/1889329). – IInspectable Dec 26 '19 at 14:28

1 Answers1

0

After some testing, it turns out visual studio cannot encode string correctly if the file is encoded to UTF-8 without signature(BOM). Everything works fine after changing encoding scheme to UTF-8 with signature.

  • 1
    This does not solve the issue. It merely makes the undesired effects go away. The *real* issue is, that your *source code* fails to subscribe to any particular character encoding. – IInspectable Dec 27 '19 at 13:35
  • If this solution does work for you, please feel free to mark yourself. – Drake Wu Jan 03 '20 at 07:14