3

As i understood from this answer to the similar question, there is a still unfixed bug in the Visual C++ STL implementation. So, there is no possibility to just write std::cout << raw_utf8_string << std::endl and enjoy the nice UTF-8 chars under Windows ;(

NOTE: My test program lives here.

But maybe there is an quite simple-to-understand workaround to handle this? My thoghts: make a wrapper class like cout_ex, which will use Windows API WriteConsoleA for console output.
In its constructor do this:

#ifdef _WIN32
if (IsValidCodePage (CP_UTF8))
{
    if (!SetConsoleCP (CP_UTF8))
        std::cout << "Could not set console input code page to UTF-8" << std::endl;
    if (!SetConsoleOutputCP (CP_UTF8))
        std::cout << "Could not set console output code page to UTF-8" << std::endl;
}
else
    std::cout << "UTF-8 code page is not supported in your system" <<   std::endl;
#endif

And in output method do this:

char const raw_utf8_text[] = "Blåbærsyltetøy! кошка!";

DWORD raw_written = 0;
WriteConsoleA (GetStdHandle (STD_OUTPUT_HANDLE), raw_utf8_text, std::strlen (raw_utf8_text), &raw_written, NULL);

And don't forget to use the undocumented Visual C++ pragma at the very beginning of src:

#pragma execution_character_set("utf-8")

But maybe one have a more clear solution :) Even with using some external libs like Poco/Boost/etc.

I try to read those articles 1, 2, but i found this way is too complicated. P.S. Overrided stream class also should set console font to the Unicode one.
P.P.S. Software versions: Windows 8 x64 + Visual C++ 2013 Express.

eraxillan
  • 1,552
  • 1
  • 19
  • 40
  • I didn't check in depth `SetConsoleCP`, but the general mantra is that on Windows UTF-8 is not a supported CP (`CP_UTF8` being there mostly for `MultiByteToWideChar` and `WideCharToMultiByte`). The typical way you work in UTF-8 on Windows is to convert to `WCHAR` at API boundaries. – Matteo Italia May 10 '14 at 19:46
  • 1
    You'll probably have more luck setting the console CP to CP_UNICODE and then converting your UTF-8 to UTF-16 manually (remember CP_UNICODE==UTF-16). – rodrigo May 10 '14 at 19:49
  • @MatteoItalia So under Windows i can only convert to UTF-16 and use std::wcout for output? – eraxillan May 11 '14 at 04:21
  • @rodrigo I will check this. However, Unicode != UTF-16 :) Why Microsoft think so, i don't understand... – eraxillan May 11 '14 at 04:22
  • @Axilles: I know that, that's why I said `CP_UNICODE`, not _Unicode_. And even weirder is why they think ANSI equals... something useful. – rodrigo May 11 '14 at 07:27
  • @rodrigo Hm, under `CP_UNICODE` you mean the `1200` code page? but documentation [says](http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx) what it is available only to managed applications! But i use unmanaged C++. – eraxillan May 11 '14 at 07:57
  • @rodrigo I ask because Visual C++ don't know about `CP_UNICODE` constant (windows.h is included). Also, `assert (IsValidCodePage (1200))` is raised, i.e. it is invalid. – eraxillan May 11 '14 at 07:59
  • @Axilles: I stand corrected! There are no such `CP_UNICODE`. But maybe you can use `WriteConsoleW()` directly to bypass the codepage stuff. – rodrigo May 11 '14 at 08:57
  • [Stl and utf-8](http://stackoverflow.com/questions/4018384/stl-and-utf-8-file-input-output-how-to-do-it/4025951) – Basilevs May 11 '14 at 11:31
  • @Axilles, are wide characters for internal use a no-go for you in this question's scope? – Basilevs May 11 '14 at 11:35
  • @rodrigo Yep, `WriteConsoleW` works correctly (however it prints a square at the end of string). But std::wcout don't... – eraxillan May 11 '14 at 13:07
  • @Basilevs Yes, i'm interesting in using pure UTF-8 strings. Without any conversion. If it possible even with black magic :) – eraxillan May 11 '14 at 13:22

1 Answers1

1

You should imbue a proper codecvt_facet within your output stream.

std::locale loc;
string encoding=getOutputEncoding(); // 
loc=std::locale(loc, createCodecvt(encoding));
cout.imbue(loc);
cout.rdbuf().imbue(loc);

Complete code here

This facet should convert internal encoding to external one. Due to some bugs in STL implementation this might be impossible to do if internal storage format is in one-byte or multibyte encoding. There is a workaround for that - to use filestreambuf instead of default output buffer.

You might have to implement your own codecvt_facet or use my iconv wrapper.

Overall I still recommend to use wide characters for internal processing. This way you might even avoid any extra conversions (besides system default ones).

Community
  • 1
  • 1
Basilevs
  • 22,440
  • 15
  • 57
  • 102
  • So, where is no way to output UTF-8 *without* conversion? Any hack, workaround - whatever. This is the main goal of the question – eraxillan May 11 '14 at 13:26
  • Of course there is. [First question you linked](http://stackoverflow.com/questions/1660492/utf-8-output-on-windows-xp-console) mentions that fputs method works just fine. IMHO you may use fstreambuf (with conversions settings disabled in imbued locale) on stdout. You will get garbage on misconfigured console, though. – Basilevs May 11 '14 at 13:35
  • I checked `fputs` - it work correctly. Thanks! So, only the Microsoft STL streams implementation has problems with UTF-8. Where can i read more about it reason? for the current moment it's hard to understand for me :( – eraxillan May 11 '14 at 14:32
  • But you linked that question yourself! There is an exhaustive explanation. Just follow links. Specifically http://connect.microsoft.com/VisualStudio/feedback/details/431244/std-ostream-fails-to-write-utf-8-encoded-string-to-console – Basilevs May 11 '14 at 14:43
  • Hm, looks like i'm too tired today :D Sorry. I need to rest and slowly read and to digest what i read. So much information - and meanwhile in Linux things are just working. – eraxillan May 11 '14 at 14:57