0

I have troubles with wide string literals using MinGW GCC compiler on Windows.

When I read the user input using wscanf, wprintf outputs correct national characters. However wide string literals stops output at the first national character:

wprintf (L"China - Čína"); // outputs "China - "

Assuming the wchar_t is encoded as UTF-16 by default (is it LE or BE?), how does it work when the source is UTF-8 file? I tried to save the source as UTF-16, but I get illegal byte sequence error.

Jan Turoň
  • 31,451
  • 23
  • 125
  • 169
  • The encoding of your source code is not related to the encoding used by the program when it runs. Your source code can be in any encoding you want, as long as the compiler knows what it is so that it can translate your string literals into the runtime character set. – Wyzard Oct 04 '14 at 22:02
  • including `` and then making a call for `setlocale(LC_ALL, "sk");` (or `"cz"`, or whichever language "Čína" is in) before that line of `wprintf` you have there, should properly display the text you put. However, it doesn't. I fails to print "Č" with the upside-down circumflex accent. Sorry... – Utkan Gezer Oct 04 '14 at 22:27
  • 1
    Check with a memory view in your debugger the memory area that holds the string. If it's there in memory then it isn't a compiler related problem. If it's not compiler related then it can be some fancy runtime library related problem. Another possible case can be that your output device (console or console emulation or whatever) doesn't support every unicode character you want to output and behaves differently than you would expect in case of fancy chars. – pasztorpisti Oct 04 '14 at 23:48
  • @pasztorpisti thanks a lot, I figured it out, see my answer – Jan Turoň Oct 05 '14 at 11:10

1 Answers1

2

As @pasztorpisti suggested, I tried memory viewer and the substring Čína is stored as 0C 01 ED 00 6E 00 61 00, which is correct in UTF-16LE.

My console uses CP852 as default codepage, so I tried chcp 1200 but it is not set! MSDN says it is for managed applications only - Microsoft knows how to create a coding hell.

It was very useful to read carefully this answer: I used WriteConsoleW to produce the UTF-16LE output in the cripled console:

void putws(const wchar_t* str) {
  WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), str, wcslen(str), NULL, NULL);
}

putws(L"China - Čína"); // outputs "China - Čína"
Community
  • 1
  • 1
Jan Turoň
  • 31,451
  • 23
  • 125
  • 169