How to make std::wofstream write UTF-8?

Question

I am redirecting std::wclog to a file for logging in my program:

std::wclog.rdbuf((new std::wofstream("C:\\path\\to\\file.log", std::ios::app))->rdbuf());

Logging happens by writing to std::wclog:

std::wclog << "Schöne Grüße!" << std::endl;

Surprisingly I found that the file is being written in ANSI. (This would be totally acceptable for ofstream and clog, but I had expected wofstream and wclog to produce some kind of unicode output.) I want to be able to log in CYK langugages as well (e.g. user input), so is there a way to get wofstream to produce UTF-8? The openmode flags seem not to provide for this.

(If there isn’t a platform-independent way, I am on Win7+ 64-bit.)

Edit:

There is an error in the question above. The line

std::wclog << "Schöne Grüße!" << std::endl;

should correctly be

std::wclog << L"Schöne Grüße!" << std::endl;

This is just to demonstrate what I want to do, in real life the wstring being written to the wofstream comes out of a class which provides for translation, like

std::wclog << _(L"Best regards") << std::endl;

where

#define _(X) i18n::translate(X)

class i18n {
public:
    static std::wstring translate(const std::wstring&);
}

So what I want to do is to write a wstring to std::wclog using an wofstring to put it into a file, and that file should be UTF-8 encoded (without BOM).

Why are you writing narrow characters to a wide-character stream? — Jonathan Wakely, Aug 17 '16 at 10:48
I thought you need to use the UTF literals if you want that? And what about locales? — CinchBlue, Aug 17 '16 at 11:02
You need to use the correct type and literals for Unicode. Visual C++ [supports the C++11 Unicode](https://msdn.microsoft.com/en-us/library/69ze775t.aspx) literals and types. Eg, ` u8"hello"` is a UTF-8 encoded `char*`, `u"hello"` is a `char16_t*` while the `u8"hello"s` and `u"hello"s` that return `std::string` and `std::u16string` . In general it's much better to use the STL string types — Panagiotis Kanavos, Aug 17 '16 at 11:08

score 3 · Answer 1 · answered Aug 17 '16 at 12:19

All you need is to use a UTF8 literal, ie:

std::wclog << u8"Schöne Grüße!" << std::endl;

The result will be

Schöne Grüße!

If you mix ASCII and UTF8 literals, eg:

std::wclog << "Schöne Grüße!" << std::endl << u8"Schöne Grüße!" <<

std::endl;

the non-ASCII characters will be replaced.

Sch?ne Gr??e!
Schöne Grüße!

Unicode literals were added to C++ 11. They were first implemented in Visual Studio 2015. The String and Character Literals page describes the literals that are supported in Visual C++ at the moment.

score 0 · Answer 2 · answered Aug 17 '16 at 10:56

0

The openmode flags seem not to provide for this.

Because it's nothing to do with the openmode.

Code conversion (i.e. character encoding) is performed by the codecvt facet of the locale being used by the stream. You can imbue the ostream with a different locale using a codecvt facet that converts to UTF-8.

But I don't know if that's necessary. I have no idea how Windows behaves, but on sane platforms you would just write narrow character strings containing UTF-8 to clog and the output would be UTF-8, you don't need to use wide streams. UTF-8 is a multibyte encoding using single octets, i.e. narrow characters.

answered Aug 17 '16 at 10:56

Jonathan Wakely

166,810
27
341
521

Windows is Unicode at the core since 2000. This has nothing to do with Windows. People suppose something is going on when they try to open files without BOM (and thus no indication that they *aren't* ANSI) and find out that Windows assumed that they use the codepage the users entered as the `System Locale` – Panagiotis Kanavos Aug 17 '16 at 10:58
BTW `w` simply means wide, it doesn't specify an encoding. C++11 added specific types for UTF-8, UTF-16 and UTF-32 as [shown in this related question](http://stackoverflow.com/questions/6796157/unicode-encoding-for-string-literals-in-c11), the [reference](http://en.cppreference.com/w/cpp/language/string_literal) and Visual C++'s [relevant page](https://msdn.microsoft.com/en-us/library/69ze775t.aspx) – Panagiotis Kanavos Aug 17 '16 at 11:03
@PanagiotisKanavos "Windows is Unicode at the core since 2000". So how do you do what OP wants to do? – n. m. could be an AI Aug 17 '16 at 11:07
I already posted this in the previous comment. It's just a matter of using the correct string literal. C++11 has Unicode types to resolve the ambiguity caused by the wide strings (those with the L prefix). `u8"something"` will return a UTF8 char*, `u8"something"s` a std::string ``u"something"s` a `std::u16string`, `u"something"` a char16_t* – Panagiotis Kanavos Aug 17 '16 at 11:11
1

@PanagiotisKanavos "Windows is Unicode at the core" is not true, I wish it was true but what they did was add wide char support, it is still broken. To fully support Unicode at its core you would need to have a wchar that can hold a complete char i.e. 32-bits but in Windows wchar are 16-bits. – AndersK Aug 17 '16 at 11:11
@PanagiotisKanavos Have you verified it's working? What version of VC++ did you use? – n. m. could be an AI Aug 17 '16 at 11:19
1

@AndersK. first, you are talking about UTF-32 while Unicode refers to UTF-16. UTF-32 is a relative newcomer but you *can* use UTF32 in C++ if you use the correct types – Panagiotis Kanavos Aug 17 '16 at 11:20
@n.m the link to the MSDN is about Visual Studio 2015. Previous versions (ie 2013-) do not support the prefixes – Panagiotis Kanavos Aug 17 '16 at 11:23
@PanagiotisKanavos no, all UTF i.e. UTF-8, UTF-16, UTF-32 need to fit into a single wchar in order for their wide char functions to work. e.g. `iswalpha` so their implementation does not fully support Unicode – AndersK Aug 17 '16 at 11:36
@PanagiotisKanavos, I'm not sure what your comments have to do with my answer. Do you mean ASCII not ANSI? I didn't say `w` implies anything about encoding, that doesn't change the fact that filebuf uses a codecvt facet to convert from the implementation-defined internal encoding used for `wchar_t` to an external encoding for the on-disk format. The `u8` literals are the easiest way to create "narrow character strings containing UTF-8" that I referred to, but not the only way. "Unicode refers to UTF-16" wat? – Jonathan Wakely Aug 17 '16 at 11:49

How to make std::wofstream write UTF-8?

2 Answers2

Linked