0

This is the way we are currently converting a UTF-16 to UTF-8 on Ubuntu 18.04.

The input is a const wchar_t* (UTF-16) and the output is a const char* (UTF-8):

const char* to_string(const wchar_t* input) 
{
    std::wstring w(input);
    std::string s(w.begin(), w.end());
    return strdup(s.c_str());
}

However, it doesn't seem to be working because I'm still seeing non UTF-8 encoded characters in my string. Example:

Adobe® Flash® Player

Is there a problem with the to_string function above?

Ankur Shah
  • 125
  • 12
  • 3
    This doesn't really convert anything, it just casts every `wchar_t` to a `char`. It only works correctly for plain 7-bit ASCII characters. For one thing, you'd always end up with `s.size() == w.size()`, when clearly a UTF-8 sequence may sometimes need to be longer (when counted in octets) than the equivalent UTF-16 sequence (when counted in 16-bit code points). – Igor Tandetnik May 02 '20 at 22:01
  • @IgorTandetnik Thanks for the response. Do you have an idea or code snippet how I could update the function above? I'm using Ubuntu 18.04. – Ankur Shah May 04 '20 at 01:15
  • See if the example [here](https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16) helps. – Igor Tandetnik May 04 '20 at 01:21
  • I've marked this as a duplicate of another question, because the other asks specifically for code that does not rely on a particular OS. If that's not sufficient for your question please let me know. – Mark Ransom May 04 '20 at 01:49

0 Answers0