4

How can I print a std::wstring using std::wcout?

I tried the following, which was recommended here, but it works only for printing this ¡Hola! but not this 日本:

#include <iostream>
#include <clocale>

int main(int argc, char* argv[])
{
  char* locale = setlocale(LC_ALL, ""); 
  std::cout << "locale: " << locale << std::endl; // "C" for me
  std::locale lollocale(locale);
  setlocale(LC_ALL, locale); 
  std::wcout.imbue(lollocale);
  std::wcout << L"¡Hola!" << std::endl; // ok
  std::wcout << L"日本" << std::endl;    // empty :(
  return 0;
}

Also the following (which was recommended here) does not print the Japanese characters at all:

#include <stdio.h>
#include <string>
#include <locale>
#include <iostream>

using namespace std;

int main()
{

        std::locale::global(std::locale(""));
        wstring japan = L"日本";
        wstring message = L"Welcome! Japan is ";

        message += japan;

        wprintf(message.c_str());
        wcout << message << endl;
}

All this is on Mac OS 10.6.8. using g++ 4.2.1, using Terminal 2.1.2.

The terminal can display the characters just fine in general, e.g., when I cat the source code. Also, this command works fine cout << "日本" << std::endl;, but I do need to print wstring.

My $LANG is this:

$ echo $LANG 
en_US.UTF-8
Community
  • 1
  • 1
Frank
  • 64,140
  • 93
  • 237
  • 324
  • This won't be that helpful, but here's the source code for Mac OSX' cat: http://www.freebsd.org/cgi/cvsweb.cgi/src/bin/cat/cat.c?rev=1.33.2.1.8.1;content-type=text%2Fx-cvsweb-markup – Wug Jul 16 '12 at 21:19
  • 1
    A `wstring` won't be UTF-8. Hopefully your compiler is converting UTF-8 source to wide-character constants. – Mark Ransom Jul 16 '12 at 21:22
  • Maybe this question is useful http://stackoverflow.com/questions/148403/utf8-to-from-wide-char-conversion-in-stl – Mattias Wadman Jul 16 '12 at 21:46
  • 1
    I'd suggest converting the string from UTF-32 to UTF-8 yourself (note that `wchar_t` is 32 bits by default on Mac OS X and Linux) and then just printing it normally using `std::cout << myUTF8StringAsCharStar`. Maybe use a helper class to do the conversion for you and handle the memory management. [libiconv](http://www.gnu.org/software/libiconv/) is useful. – Adam Rosenfield Jul 16 '12 at 21:57
  • +1 to @AdamRosenfield. But most conversion APIs let you just deal with wchar_t without worrying about whether it's UTF-16 or UTF-32, which is nice, because then your code is portable to 16-bit-wchar platforms. (However, it's not actually guaranteed that wchar_t is UTF-16 or -32 rather than some other 16- or 32-bit charset, so it's still not really portable.) – abarnert Jul 17 '12 at 00:16
  • Also, it looks like in Lion, `setlocale(LC_ALL, "")` returns "en_US.UTF-8" instead of "C", and just calling setlocale(LC_ALL, locale) without imbuing wcout makes everything work. But this doesn't help with Snow Leopard. – abarnert Jul 17 '12 at 00:18

4 Answers4

9

The way you print wstring is by converting it to a UTF-8 char based string. Seriously wchar_t is pointless outside of Windows or one of the various other platform libraries that unfortunately adopted use of wchar_t before it became clear what a bad idea it is.

// move to clang and libc++ then
#include <codecvt>

int main(){
    std::wstring_convert<std::codecvt_utf8<wchar_t>,wchar_t> convert; // converts between UTF-8 and UCS-4 (given sizeof(wchar_t)==4)
    std:wstring s = L"日本";
    std::cout << convert.to_bytes(s);
}

And just to explain what's going wrong in the code you show;

char* locale = setlocale(LC_ALL, ""); 
std::cout << "locale: " << locale << std::endl; // "C" for me

The locale string here is the locale name after applying changes. Since you say you get "C" it means you're using the "C" locale. Normally one would get a name like "en_US.UTF-8" but for whatever reason your environment isn't set up correctly for that. You show that $LANG is set correctly but perhaps one of the other locale environment variables is set differently.

In any case you're using the "C" locale, which is only required to support the basic character set. I believe on OS X the behavior you'll get is that any char will directly convert to the same wchar_t value, and only wchar_t values in the range supported by char will convert back. That's effectively the same as using an ISO 8859-1 based locale, so Japanese characters will not work.


If you really insist on getting this locale based stuff to work then you need to get an appropriate locale, one that uses UTF-8. You can either figure out what's wrong with your environment or you can use a non-portable, explicit locale name.

std::wcout.imbue(std::locale("en_US.UTF-8"));
std::wcout << L"¡Hola!\n";
std::wcout << L"日本\n";

Also, if you're using libstdc++ you should know that it doesn't support locales properly on OS X. You'll have to use libc++ in order for OS X's locale names (e.g., "en_US.UTF-8") to work.

Community
  • 1
  • 1
bames53
  • 86,085
  • 15
  • 179
  • 244
5

According to multiple bug reports on libstdc++ (such as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35353), there are nasty interactions between the C runtime and libstdc++, and nobody seems eager to try to fix it, probably because utf-8 "just works" for most cases.

The bug report mentions two workarounds, using either ios_base::sync_with_stdio(false) or locale::global(...).

DanielKO
  • 4,422
  • 19
  • 29
3

Default encoding on:

  • Windows UTF-16.
  • Linux UTF-8.
  • MacOS UTF-8.

My solution Steps, includes null chars \0 (avoid truncated). Without using functions on windows.h header:

  1. Add Macros to detect Platform. Windows/Linux and others
  1. Create function to convert std::wstring to std::string and inverse std::string to std::wstring
  1. Create function for print
  1. Print std::string/ std::wstring

Check RawString Literals. Raw String Suffix.

Linux Code. Print directly std::string using std::cout, Default Encoding on Linux is UTF-8, no need extra functions.

On Windows if you need to print unicode. We can use WriteConsole for print unicode chars from std::wstring.

Finally on Windows. You need a powerfull and complete view support for unicode chars in console. I recommend Windows Terminal

QA

  • Tested on Microsoft Visual Studio 2019 with VC++; std=c++17. (Windows Project)
  • Tested on repl.it using Clang compiler; std=c++17.

Q. Why you not use <codecvt> header functions and classes?.
A. Deprecate Removed or deprecated features impossible build on VC++, but no problems on g++. I prefer 0 warnings and headaches.

Q. std ::wstring is cross platform?
A. No. std::wstring uses wchar_t elements. On Windows wchar_t size is 2 bytes, each character is stored in UTF-16 units, if character is bigger than U+FFFF, the character is represented in two UTF-16 units(2 wchar_t elements) called surrogate pairs. On Linux wchar_t size is 4 bytes each character is stored in one wchar_t element, no needed surrogate pairs. Check Standard data types on UNIX, Linux, and Windowsl.

Q. std ::string is cross platform?
A. Yes. std::string uses char elements. char type is guaranted that is same byte size in most compilers. char type size is 1 byte. Check Standard data types on UNIX, Linux, and Windowsl.

Full example code


#include <iostream>
#include <set>
#include <string>
#include <locale>

// WINDOWS
#if (_WIN32)
#include <Windows.h>
#include <conio.h>
#define WINDOWS_PLATFORM 1
#define DLLCALL STDCALL
#define DLLIMPORT _declspec(dllimport)
#define DLLEXPORT _declspec(dllexport)
#define DLLPRIVATE
#define NOMINMAX

//EMSCRIPTEN
#elif defined(__EMSCRIPTEN__)
#include <emscripten/emscripten.h>
#include <emscripten/bind.h>
#include <unistd.h>
#include <termios.h>
#define EMSCRIPTEN_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

// LINUX - Ubuntu, Fedora, , Centos, Debian, RedHat
#elif (__LINUX__ || __gnu_linux__ || __linux__ || __linux || linux)
#define LINUX_PLATFORM 1
#include <unistd.h>
#include <termios.h>
#define DLLCALL CDECL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#define CoTaskMemAlloc(p) malloc(p)
#define CoTaskMemFree(p) free(p)

//ANDROID
#elif (__ANDROID__ || ANDROID)
#define ANDROID_PLATFORM 1
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))

//MACOS
#elif defined(__APPLE__)
#include <unistd.h>
#include <termios.h>
#define DLLCALL
#define DLLIMPORT
#define DLLEXPORT __attribute__((visibility("default")))
#define DLLPRIVATE __attribute__((visibility("hidden")))
#include "TargetConditionals.h"
#if TARGET_OS_IPHONE && TARGET_IPHONE_SIMULATOR
#define IOS_SIMULATOR_PLATFORM 1
#elif TARGET_OS_IPHONE
#define IOS_PLATFORM 1
#elif TARGET_OS_MAC
#define MACOS_PLATFORM 1
#else

#endif

#endif



typedef std::string String;
typedef std::wstring WString;

#define EMPTY_STRING u8""s
#define EMPTY_WSTRING L""s

using namespace std::literals::string_literals;

class Strings
{
public:
    static String WideStringToString(const WString& wstr)
    {
        if (wstr.empty())
        {
            return String();
        }
        size_t pos;
        size_t begin = 0;
        String ret;

#if WINDOWS_PLATFORM
        int size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), NULL, 0, NULL, NULL);
            String converted = String(size, 0);
            WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.size(), NULL, NULL);
            ret.append(converted);
        }
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = wstr.find(static_cast<wchar_t>(0), begin);
        while (pos != WString::npos && begin < wstr.length())
        {
            WString segment = WString(&wstr[begin], pos - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = wstr.find(static_cast<wchar_t>(0), begin);
        }
        if (begin <= wstr.length())
        {
            WString segment = WString(&wstr[begin], wstr.length() - begin);
            size = wcstombs(nullptr, segment.c_str(), 0);
            String converted = String(size, 0);
            wcstombs(&converted[0], segment.c_str(), converted.size());
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }

    static WString StringToWideString(const String& str)
    {
        if (str.empty())
        {
            return WString();
        }

        size_t pos;
        size_t begin = 0;
        WString ret;
#ifdef WINDOWS_PLATFORM
        int size = 0;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != std::string::npos) {
            std::string segment = std::string(&str[begin], pos - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, &segment[0], segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length()) {
            std::string segment = std::string(&str[begin], str.length() - begin);
            std::wstring converted = std::wstring(segment.size() + 1, 0);
            size = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, segment.c_str(), segment.size(), &converted[0], converted.length());
            converted.resize(size);
            ret.append(converted);
        }

#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        size_t size;
        pos = str.find(static_cast<char>(0), begin);
        while (pos != String::npos)
        {
            String segment = String(&str[begin], pos - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
            ret.append({ 0 });
            begin = pos + 1;
            pos = str.find(static_cast<char>(0), begin);
        }
        if (begin < str.length())
        {
            String segment = String(&str[begin], str.length() - begin);
            WString converted = WString(segment.size(), 0);
            size = mbstowcs(&converted[0], &segment[0], converted.size());
            converted.resize(size);
            ret.append(converted);
        }
#else
        static_assert(false, "Unknown Platform");
#endif
        return ret;
    }
};

enum class ConsoleTextStyle
{
    DEFAULT = 0,
    BOLD = 1,
    FAINT = 2,
    ITALIC = 3,
    UNDERLINE = 4,
    SLOW_BLINK = 5,
    RAPID_BLINK = 6,
    REVERSE = 7,
};

enum class ConsoleForeground
{
    DEFAULT = 39,
    BLACK = 30,
    DARK_RED = 31,
    DARK_GREEN = 32,
    DARK_YELLOW = 33,
    DARK_BLUE = 34,
    DARK_MAGENTA = 35,
    DARK_CYAN = 36,
    GRAY = 37,
    DARK_GRAY = 90,
    RED = 91,
    GREEN = 92,
    YELLOW = 93,
    BLUE = 94,
    MAGENTA = 95,
    CYAN = 96,
    WHITE = 97
};

enum class ConsoleBackground
{
    DEFAULT = 49,
    BLACK = 40,
    DARK_RED = 41,
    DARK_GREEN = 42,
    DARK_YELLOW = 43,
    DARK_BLUE = 44,
    DARK_MAGENTA = 45,
    DARK_CYAN = 46,
    GRAY = 47,
    DARK_GRAY = 100,
    RED = 101,
    GREEN = 102,
    YELLOW = 103,
    BLUE = 104,
    MAGENTA = 105,
    CYAN = 106,
    WHITE = 107
};

class Console
{
private:
    static void EnableVirtualTermimalProcessing()
    {
#if defined WINDOWS_PLATFORM
        HANDLE hOut = GetStdHandle(STD_OUTPUT_HANDLE);
        DWORD dwMode = 0;
        GetConsoleMode(hOut, &dwMode);
        if (!(dwMode & ENABLE_VIRTUAL_TERMINAL_PROCESSING))
        {
            dwMode |= ENABLE_VIRTUAL_TERMINAL_PROCESSING;
            SetConsoleMode(hOut, dwMode);
        }
#endif
    }

    static void ResetTerminalFormat()
    {
        std::cout << u8"\033[0m";
    }

    static void SetVirtualTerminalFormat(ConsoleForeground foreground, ConsoleBackground background, std::set<ConsoleTextStyle> styles)
    {
        String format = u8"\033[";
        format.append(std::to_string(static_cast<int>(foreground)));
        format.append(u8";");
        format.append(std::to_string(static_cast<int>(background)));
        if (styles.size() > 0)
        {
            for (auto it = styles.begin(); it != styles.end(); ++it)
            {
                format.append(u8";");
                format.append(std::to_string(static_cast<int>(*it)));
            }
        }
        format.append(u8"m");
        std::cout << format;
    }
public:
    static void Clear()
    {

#ifdef WINDOWS_PLATFORM
        std::system(u8"cls");
#elif LINUX_PLATFORM || defined MACOS_PLATFORM
        std::system(u8"clear");
#elif EMSCRIPTEN_PLATFORM
        emscripten::val::global()["console"].call<void>(u8"clear");
#else
        static_assert(false, "Unknown Platform");
#endif
    }

    static void Write(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        String str = s;
#ifdef WINDOWS_PLATFORM
        WString unicode = Strings::StringToWideString(str);
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), unicode.c_str(), static_cast<DWORD>(unicode.length()), nullptr, nullptr);
#elif defined LINUX_PLATFORM || defined MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << str;
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const String& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void Write(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
#ifndef EMSCRIPTEN_PLATFORM
        EnableVirtualTermimalProcessing();
        SetVirtualTerminalFormat(foreground, background, styles);
#endif
        WString str = s;

#ifdef WINDOWS_PLATFORM
        WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), str.c_str(), static_cast<DWORD>(str.length()), nullptr, nullptr);
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        std::cout << Strings::WideStringToString(str);
#else
        static_assert(false, "Unknown Platform");
#endif

#ifndef EMSCRIPTEN_PLATFORM
        ResetTerminalFormat();
#endif
    }

    static void WriteLine(const WString& s, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        Write(s, foreground, background, styles);
        std::cout << std::endl;
    }

    static void WriteLine()
    {
        std::cout << std::endl;
    }

    static void Pause()
    {
        char c;
        do
        {
            c = getchar();
            std::cout << "Press Key " << std::endl;
        } while (c != 64);
        std::cout << "KeyPressed" << std::endl;
    }

    static int PauseAny(bool printWhenPressed = false, ConsoleForeground foreground = ConsoleForeground::DEFAULT, ConsoleBackground background = ConsoleBackground::DEFAULT, std::set<ConsoleTextStyle> styles = {})
    {
        int ch;
#ifdef WINDOWS_PLATFORM
        ch = _getch();
#elif LINUX_PLATFORM || MACOS_PLATFORM || EMSCRIPTEN_PLATFORM
        struct termios oldt, newt;
        tcgetattr(STDIN_FILENO, &oldt);
        newt = oldt;
        newt.c_lflag &= ~(ICANON | ECHO);
        tcsetattr(STDIN_FILENO, TCSANOW, &newt);
        ch = getchar();
        tcsetattr(STDIN_FILENO, TCSANOW, &oldt);
#else
        static_assert(false, "Unknown Platform");
#endif
        if (printWhenPressed)
        {
            Console::Write(String(1, ch), foreground, background, styles);
        }
        return ch;
    }
};



int main()
{
    std::locale::global(std::locale(u8"en_US.UTF8"));
    auto str = u8"\0Hello\0123456789也不是可运行的程序123456789日本"s;//
    WString wstr = L"\0Hello\0123456789也不是可运行的程序123456789日本"s;
    WString wstrResult = Strings::StringToWideString(str);
    String strResult = Strings::WideStringToString(wstr);
    bool equals1 = wstr == wstrResult;
    bool equals2 = str == strResult;

    Console::WriteLine(u8"█ Converted Strings printed with Console::WriteLine"s, ConsoleForeground::GREEN);
    Console::WriteLine(wstrResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
    Console::WriteLine(strResult, ConsoleForeground::BLUE);//Printed OK on Windows/Linux.
    
    Console::WriteLine(u8"█ Converted Strings printed with std::cout/std::wcout"s, ConsoleForeground::GREEN);
    std::cout << strResult << std::endl;//Printed OK on Linux. BAD on Windows.
    std::wcout << wstrResult << std::endl; //Printed BAD on Windows/Linux.
    Console::WriteLine();
    Console::WriteLine(u8"Press any key to exit"s, ConsoleForeground::DARK_GRAY);
    Console::PauseAny();

}

You cant test this code on https://repl.it/@JomaCorpFX/StringToWideStringToString#main.cpp


**Screenshots**

Using Windows Terminal WindowsTerminal

Using cmd/powershell enter image description here

Repl.it capture
enter image description here

Joma
  • 3,520
  • 1
  • 29
  • 32
1

use nowide library to convert to UTF-8 in the easiest way. Then, use regular printf.

Pavel Radzivilovsky
  • 18,794
  • 5
  • 57
  • 67