0

The following code fails and I can't seem to figure out why.

std::string s = "–";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::wstring wide = converter.from_bytes(s);

I tried reading up on UTF-8, but I couldn't figure it out. Storing the initial string as a wstring, converting it to a string then converting it back gives the correct result.

std::wstring ws = L"–";
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
std::string narrow = converter.to_bytes(ws);
std::wstring wide = converter.from_bytes(narrow);

1 Answers1

2

Most likely, your C++ source file is not saved as UTF-8, or the compiler is not interpreting it as UTF-8, either way causing "–" to not actually represent U+2013 EN DASH at runtime. You can easily verify that at runtime, such as with a debugger, or just printing out the raw bytes of your string.

Make sure your C++ file is saved in UTF-8, and compiled as UTF-8. Or else try using this code instead:

std::string s = u8"–";

Or:

std::string s = "\xE2\x80\x93";
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770