2

I've been trying to convert between UTF8 and UTF16 LE with BOM using C++ to make the characters output correctly on Windows, without having to change the font of the terminal.

I tried changing the code pages, but they didn't work.

I have 2 questions,

  1. How can I convert a normal string to a wide string?
  2. Is it a bad idea to create a C++ map that maps each unicode character to the character in the Windows code page?

For example,

wcout << L"\u00A0" << endl;

This code outputs the letter á on Windows when using Code page 850. How can I put a variable in place of the "\u00A0" to convert a normal string to a wide character on Windows?

What I'd like is this:

wcout << Lsome_variable << endl;

I realise it's not valid c++ syntax but does anyone know how I can do this? Or if there's a better way?

Francis
  • 919
  • 3
  • 14
  • 23
  • 2
    There's [`std::wstring_convert`](http://en.cppreference.com/w/cpp/locale/wstring_convert) – chris Jul 20 '14 at 16:20
  • 2
    Just use wide strings all the way. And `_setmode` for the standard streams. – Cheers and hth. - Alf Jul 20 '14 at 16:32
  • 3
    @Cheersandhth.-Alf: Better to use UTF-8 everywhere, except WINAPI calls forcing UTF-16. Wide is not Wide! [UTF-8 Everywhere Manifesto](http://www.utf8everywhere.org) – Deduplicator Jul 20 '14 at 16:38
  • 4
    @Deduplicator: since visual c++, the main compiler on the windows platform, does not support utf-8 literals, and since the windows console subsystem does not support utf-8 input, it's just dumb to "use utf-8 everywhere". sorry. but that's how it is – Cheers and hth. - Alf Jul 20 '14 at 16:53
  • @Cheersandhth.-Alf: If my comment is pure fanboy (which characterization I resent), yours is not any better. You neither link to nor provide any argument for going full-`wchar_t` being better in all and any circumstances. (I concede there are *specific cases* it is the easy way to go) – Deduplicator Jul 20 '14 at 17:13
  • 2
    @Deduplicator: i think the fanboyness of your enthusiastic argument is well illustrated by your statement “You neither link to nor provide any argument for going full-wchar_t being better in all and any circumstances”, which (1) ignores the facts presented, and (2) misleadingly introduces a straw man to argue against. that's not an engineer's argument. it's a fanboy, or troll. – Cheers and hth. - Alf Jul 20 '14 at 17:23
  • 2
    You want MultiByteToWideChar and WideCharToMutiByte functions. Since source code is interpreted as the local system language, avoid using anything but 7-bit ASCII in string literals. You can make UTF-16 string literals and use \x8a05 for other characters. – Khouri Giordano Jul 20 '14 at 17:33
  • Alf: all this was said, and these concerns were answered in http://programmers.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful long time ago. The view you advocate here is, in fact, leading to more confusion. If you are not convinced, I would be happy if you post to that thread. – Pavel Radzivilovsky Jul 21 '14 at 15:25
  • @Cheersandhth.-Alf the one time it's almost mandatory to use UTF-8 is when you're trying to use `std::exception` with Unicode. There are no methods that take or provide `wchar_t`. – Mark Ransom Aug 02 '21 at 18:46
  • See [UTF8 to/from wide char conversion in STL](https://stackoverflow.com/q/148403/5987) for alternatives. – Mark Ransom Aug 02 '21 at 18:49

1 Answers1

6

As noted in the comments, the standard library now provides things like std::wstring_convert (and other functions/classes in the See Also section of that page).

Since you're on Windows, the WinAPI also has conversion functions. In this case you would be looking for MultiByteToWideChar which can be used to convert from UTF-8 to UTF-16.

Between those options, something should fit your use case. Generally speaking, you should never need to write your own conversion map.

TheUndeadFish
  • 8,058
  • 1
  • 23
  • 17
  • 1
    A UTF to local character set conversion can also include character transforms that are not reversible, but visually resemble the original. – Khouri Giordano Jul 20 '14 at 17:39
  • 1
    wstring_convert is deprecated now: https://stackoverflow.com/questions/42946335/deprecated-header-codecvt-replacement – Trass3r Oct 18 '18 at 12:39