1

In MSVC++, if you create a new Visual Studio console application (x64 platform, running on Windows 8.1, x64), and set it to a Unicode character set with the following code in main:

int _tmain(int argc, _TCHAR* argv[])
{
    stringstream stream;
    stream << _T("Testing Unicode. English - Ελληνικά - Español.") << std::endl;
    string str = stream.str();
    std::wcout << str.c_str();
    cin.get();
}

It outputs this:

00007FF616443E50

I would like it to output this instead:

Testing Unicode. English - Ελληνικά - Español.

How can this be achieved?

Edit: With wstringstream and wstring instead:

wstringstream stream; stream << _T("Testing Unicode. English - Ελληνικά - Español.") << std::endl;
wstring str = stream.str();
std::wcout << str.c_str();

The output is truncated:

Testing Unicode. English -

Setting the mode like so: _setmode(_fileno(stdout), _O_U16TEXT);

The output is still undesirable because not all characters get rendered properly:

Testing Unicode. English - ???????? - Español.

Setting the output CP like so: SetConsoleOutputCP(CP_UTF8);

The output is again truncated:

Testing Unicode. English -

Alexandru
  • 12,264
  • 17
  • 113
  • 208
  • 5
    Since you are using wide chars, use: `std::wstringstream` and `std::wstring` – Brandon Mar 22 '14 at 14:04
  • 1
    You haven't set output to unicode. The `_TCHAR` and `_T` macros etc. don't do that: they provide some support for a primitive compatibility scheme for Windows 9x. You should just ditch that stuff, but do set Unicode for output. Oh I found old blog article: http://alfps.wordpress.com/2011/12/08/unicode-part-2-utf-8-stream-mode/ – Cheers and hth. - Alf Mar 22 '14 at 14:09
  • @Cheersandhth.-Alf Some guys at my work swear by the _T...I don't think you'd get along with them. Hey, what do you mean when you say to set the output to unicode? In the project's settings I have set the Character Set to "Use Unicode Character Set". – Alexandru Mar 22 '14 at 14:12
  • @CantChooseUsernames With std::wstringstream and std::wstring, the output is now truncated, meaning, I only see this as the output: "Testing Unicode. English -" – Alexandru Mar 22 '14 at 14:13
  • Duplicate: http://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app – danielschemmel Mar 22 '14 at 14:27
  • @dionadar The answers in that question don't work for me (see my edits). – Alexandru Mar 22 '14 at 14:31
  • 1
    @Alexandru You probably did not pick a font capable of displaying the necessary characters in your terminal window. Choose on of the truetype fonts, NOT the raster fonts. – danielschemmel Mar 22 '14 at 14:34
  • @Alexandru: You're probably(but not necessarily) right about my view of your coworkers, if I should get to know them. For *professionals* who use `_T` stuff for other than legacy code maintenance, are necessarily conformists, as I see it :-(. anyway, re "set Unicode [mode]", I meant like `_setmode( _fileno( f ), _O_U8TEXT )`. – Cheers and hth. - Alf Mar 22 '14 at 14:51
  • @Cheersandhth.-Alf Hey, I use the _T directive in Visual Studio to tell it to use wide or regular character formatting...like, if you removed that from my above code, it would only output ????????? to the console no matter the font you choose...unless, is there some alternative you recommend? – Alexandru Mar 22 '14 at 14:51
  • 1
    @Alexandru: Instead of Micrsoftism/Windows 9x `_T( "Blah" )`, in modern Windows code just write standard `L"Blah"`. – Cheers and hth. - Alf Mar 22 '14 at 14:52
  • @Cheersandhth.-Alf Gotcha! This works. Thanks, I'll use that from now on. – Alexandru Mar 22 '14 at 14:53
  • @Cheersandhth.-Alf Hey, I have to come back to our discussion for a brief second. I noticed that some functions, like RegSetValueEx, behave differently in certain conditions. Debug/x64 takes in an LPCSTR. Release/x64 takes in an LPCWSTR. When passing a text parameter like so: RegSetValueEx(key, TEXT("SomeParameterName"), ...) it works, but if you were to use L-notation on this it will break in debug since it doesn't do the compatability matching between wide or not. – Alexandru Mar 23 '14 at 14:42
  • 1
    @Alexandrescu: Your project's UNICODE setting for Debug build is wrong. I prefer to define `UNICODE` and `_UNICODE` in the source code instead of relying on brittle build system settings, but it's less up-front work to change the settings. Anyway, background: *most* Windows API functions exist in two variants: `FooW` (W for wide), which is the basic UTF-16-oriented version, and `FooA` (A for ANSI), which is a `char` based *wrapper* that expects the encoding specified by `GetACP()`. With `UNICODE` defined the macro `Foo` expands to `FooW`, otherwise to `FooA`. That's the short version. :) – Cheers and hth. - Alf Mar 23 '14 at 15:46
  • 1
    `GetCommandLineArgvW` is an example of a function that *only* is available in basic UTF-16 version, no ANSI wrapper. – Cheers and hth. - Alf Mar 23 '14 at 15:48
  • 1
    Also, while we're at, in order to make the output work in general you will have to (1) make sure that the console window uses a TrueType font like Lucida Console, or better, and (2) either use the API level Unicode console output functions or force the Microsoft runtime library to do so by using `_setmode` as appropriate on the three standard streams. Note in particular that the runtime library doesn't tackle input of UTF-8, so at this time best solution is still to use UTF-16, wide strings, all over. – Cheers and hth. - Alf Mar 23 '14 at 15:51
  • 1
    The main trick for `_setmode` is to check whether the stream is hooked up to a console window or not. If console then force UTF-16. If not, then force conversion to/from UTF-8 (best support for the program's general textual needs), or ANSI (best compatibility with other console programs), whatever one desires for files and pipes. – Cheers and hth. - Alf Mar 23 '14 at 15:59
  • @Cheersandhth.-Alfescu Ah shit, you were right that the UNICODE setting wasn't being set properly (verified by right-clicking the call to RegSetValueEx and hitting Go to Definition) but it was actually my mistake; I forgot to set the Character Set to Use Unicode Character Set on Debug/x64; I had only set it on Release/x64. I got a little scared, started calling into question whether or not I could trust M$. L notation - still good. Thumbs up, thanks brother. – Alexandru Mar 23 '14 at 18:09

1 Answers1

2

Using the following just doesn't work alone.. What you must also do is right click the Visual Studio console that pops up. Click Default Properties. Click the Fonts tab and set the font to Lucida Consolas. Then the below code will run just fine. Without the overloads of the << operator for windows, it will NOT work. You may also want to make an overload for char or wchar_t or simply make this a template overload..

If you do not like the overloads, you may use _setmode(_fileno(stdout), _O_U16TEXT); or _setmode(_fileno(stdout), _O_U8TEXT); for UTF16 and UTF8 respectfully.

// Unicode.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include <sstream>
#include <iostream>

#if defined _WIN32 || defined _WIN64
    #include <Windows.h>
#else
    #include <io.h>
    #include <fcntl.h>
#endif

#if defined _WIN32 || defined _WIN64
std::ostream& operator << (std::ostream& os, const char* data)
{
    SetConsoleOutputCP(CP_UTF8);
    DWORD slen = strlen(data);
    WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE), data, slen, &slen, nullptr);
    return os;
}

std::ostream& operator << (std::ostream& os, const std::string& data)
{
    SetConsoleOutputCP(CP_UTF8);
    WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE), data.c_str(), data.size(), nullptr, nullptr);
    return os;
}

std::wostream& operator <<(std::wostream& os, const wchar_t* data)
{
    DWORD slen = wcslen(data);
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), data, slen, &slen, nullptr);
    return os;
}

std::wostream& operator <<(std::wostream& os, const std::wstring& data)
{
    WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), data.c_str(), data.size(), nullptr, nullptr);
    return os;
}
#endif

int _tmain(int argc, _TCHAR* argv[])
{
    std::wstringstream stream;
    stream << _T("Testing Unicode. English - Ελληνικά - Español.") << std::endl;

    #if !defined _WIN32 && !defined _WIN64
        _setmode(_fileno(stdout), _O_U16TEXT);
    #endif

    std::wstring str = stream.str();
    std::wcout << str;
    std::wcin.get();
    return 0;
}

On Windows there is ONE more thing that can help render fonts in ANY language.. I found this was not posted anywhere else on the net.. I navigated to Control Panel\Appearance and Personalization\Fonts. I clicked Font Settings and then unchecked Hide fonts based on language settings. Saved the options. This will allow you to write Japanese and Chinese characters as well as arabic and whatever other languages you want. Seems to work with the default console fonts as well.. I had to restart for it to take effect though. Not sure if it actually works for anyone else..

Brandon
  • 22,723
  • 11
  • 93
  • 186
  • 1
    Confirmed that it works. Right clicking the Console window, navigating to the Properties section, and selecting the "Lucida Console" font will render all of the characters properly when using _setmode(_fileno(stdout), _O_U16TEXT); – Alexandru Mar 22 '14 at 14:37
  • Really an awesome answer man. :) Also as an aside from some testing I just did, to anyone who wants to code for multi-platform/multi-character-set support in Windows, wstringstream and wstring are backwards compatible so I would recommend using those for both Unicode and Multi-Byte character sets/both x64 and x86 platforms. – Alexandru Mar 22 '14 at 14:48
  • ...and also to use L notation instead of _T or TEXT. – Alexandru Mar 22 '14 at 14:54