9

My code is basically this:

wstring japan = L"日本";
wstring message = L"Welcome! Japan is ";

message += japan;

wprintf(message.c_str());

I'm wishing to use wide strings but I do not know how they're outputted, so I used wprintf. When I run something such as:

./widestr | hexdump

The hexidecimal codepoints create this:

65 57 63 6c 6d 6f 21 65 4a 20 70 61 6e 61 69 20 20 73 3f 3f
e  W  c  l  m  o  !  e  J     p  a  n  a  i        s  ?  ?

Why are they all jumped in order? I mean if the wprintf is wrong I still don't get why it'd output in such a specific jumbled order!

edit: endianness or something? they seem to rotate each two characters. huh.

EDIT 2: I tried using wcout, but it outputs the exact same hexidecimal codepoints. Weird!

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
John D.
  • 265
  • 4
  • 10

1 Answers1

15

You need to define locale

    #include <stdio.h>
    #include <string>
    #include <locale>
    #include <iostream>

    using namespace std;

    int main()
    {

            std::locale::global(std::locale(""));
            wstring japan = L"日本";
            wstring message = L"Welcome! Japan is ";

            message += japan;

            wprintf(message.c_str());
            wcout << message << endl;
    }

Works as expected (i.e. convert wide string to narrow UTF-8 and print it).

When you define global locale to "" - you set system locale (and if it is UTF-8 it would be printed out as UTF-8 - i.e. wstring will be converted)

Edit: forget what I said about sync_with_stdio -- this is not correct, they are synchronized by default. Not needed.

Artyom
  • 31,019
  • 21
  • 127
  • 215
  • 1
    You make it sound like `sync_with_stdio` and `wcout` are alternatives; they do completely different things. `sync_with_stdio` is required if you want to interleave C stream functions (like `wprintf`) with C++ stream usage (`wcout`); `imbue` is needed if you want to change the locale used by `wcout`. – CB Bailey Jun 28 '10 at 07:40
  • 1
    I can't test it, but `wcout` should work without codepage settings on Windows because `wchar_t` is a UTF-16 code unit on Windows and UTF-16 is Windows's only native encoding. So `std::wcout` should use `WriteConsoleW` without any locale conversion. If it doesn't, it's a library bug. – Philipp Jun 28 '10 at 07:42
  • 2
    @Philipp It is not how this is defined by standard. Standard says that wide characters should be converted to narrow encoding according to locale's codepage. And this is what is done. The issue with Windows is that it does not support UTF-8. So for Windows you probably need to use `locale::globale(locale("Japan"))` and it would use Shift-JIS encoding in output. Otherwise it would fail to convert characters. – Artyom Jun 28 '10 at 07:54
  • microsofts standard libraries `wcout` implementation uses the global `c-locale` internally, so imbueing a locale is practically useless. You have to set the desired locale as global locale... – smerlin Jun 28 '10 at 08:00
  • @Artyom: Thanks for the comment. This means that `std::wcout` is essentially useless on Windows. I'd consider this to be a mistake in the C++ standard that is unnecessarily biased towards Unix. BTW, Windows consoles do support UTF-8 (via `SetConsoleCodePage`), but all code pages are obsolete and only kept for compatibility reasons. Shift-JIS is even more obsolete than UTF-8 because it's not a Unicode encoding. So it seems that one really has to call `WriteConsoleW` directly. – Philipp Jun 28 '10 at 08:11
  • regarding my comment: this is only true for the ctype facet, imbueing a locale works for all other facets AFAIK – smerlin Jun 28 '10 at 08:22
  • @Artyom, , @others thanks, it helped me learn an annoying part of the language. Works fine now. – John D. Jun 28 '10 at 08:36
  • output of "wprintf" is as expected, but "wcout" doesn't show any output (all blank). why so? – parasrish Dec 15 '15 at 18:10
  • What about for japanese characters like L"{\"type\":\"string\",\"value\":\"\\u9CE5\"},\n" . It doesn't seem to wprintf or wcout like you show above. – Michele Apr 09 '17 at 21:45