2

Imagine I have decided to use UTF-8 everywhere internally in my C++11 program, so I have a std::string that contains text encoded in UTF-8. I now want to do some IO of that text. Writing it to std::cout, for example. Although I've used UTF-8 internally, I can not assume the program user and operating environment is so obliging as to use UTF-8 too. For good or bad reasons, the character encoding of text that I ought to send through std::cout might not be UTF-8. My program must perform a conversion, taking my UTF-8 encoded text and converting it to the encoding that std::cout expects. How can I find out the encoding on that output stream, then do the character encoding?

Looking at the declarations of standard C++ streams, it looks like I can use std::io_base::get_loc to get the "locale" of the output stream, then get a std::codecvt "code conversion facet" for the stream. But which facet should I get? And how do I actually use that facet to convert from UTF-8 to the output encoding?

And if those facilities of the standard library can not do the task, what other options do I have?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
  • On which OS, terminal, version etc. – Surt Dec 10 '17 at 11:24
  • There's no standard way to do this in C++. If your users don't like UTF-8 it could be economically viable to dump them and cater to better users anyway. – n. m. could be an AI Dec 10 '17 at 11:39
  • @Surt I want to write portable code. The whole point of locales and locale facets is, surely, so you don't need to know those details: the locale and facet implementations take care if that. – Raedwald Dec 10 '17 at 11:39
  • 2
    Locales and locale facets and the whole bunch of C++ APIs around them is an utter failure. There are **no** portable C++ programs in existence that utilise locales and support Unicode. It is impossible to write any. – n. m. could be an AI Dec 10 '17 at 12:32
  • Your other options are: (1) third party libraries like icu or iconv (2) knowing and relying on the fact that in popular implementations wide strings are in fact either utf16 or ucs4 encoded – n. m. could be an AI Dec 11 '17 at 10:01

1 Answers1

1

How can I find out the encoding on that output stream

You don't.

The expectations of the receiver of any output stream that is not yourself (whether cout, cerr, a file-stream, or whatever) are not something that you can determine. The concept of "standard output" does not come bundled with an associated concept of "encoding". Encoding expectations are implicit, not explicit.

Yes, streams have locale facets. But that is purely you saying "I want to encode output in this way". That says nothing about the needs of the consumer on the other end of the stream. It's simply a way for you to do conversions to what you believe the receiver wants.

C++ doesn't have a way to query what the receiver expects. And without that knowledge, ICU or iconv or whatever are not helpful to you.

The way this is generally done is with platform-specific code. On your Windows build, you can either output wchar_ts encoded in UTF-16, or set codepages and use facets to convert for that. On Linux, you can generally assume that the console will accept UTF-8. And so forth.

But there is no simple "do this and it will work" mechanism.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • 1
    We cannot determine the expectations of the receiver because we cannot read people's mind, but the **specification** of what the receiver is supposed to get by default is built into in the system default locale (one with the name `""` in C and C++). – n. m. could be an AI Dec 11 '17 at 22:17
  • @n.m.: But that doesn't actually solve the OP's problem. – Nicol Bolas Dec 11 '17 at 22:22
  • No it doesn't. I'm just objecting to your answer. It states that the problem can't be solved, which is correct, but you are citing all the wrong reasons. – n. m. could be an AI Dec 11 '17 at 22:42
  • OK, so let us assume I can determine the encoding to use (perhaps using system dependent means). My question also asks, what do I do next? If I've got a correct `std::locale` object for the `std::cout`, how do I use that to convert a UTF-8 string to the encoding of that locale? – Raedwald Dec 12 '17 at 09:04
  • With a mixture of depression, bemusement, and anger, I am learning that [support for Unicode in standard C++ is terrible](https://stackoverflow.com/a/17106065/545127). – Raedwald Dec 12 '17 at 22:55