2

A library stores unicode strings as.

std::vector<unsigned short> ustring;

How do I do these conversions in a portable way?

convert ustring to std::wstring;
convert ustring to std::string;
convert ustring to std::vector<unsigned char>;
convert std::vector<unsigned char> to ustring;


EDIT
The strings are probably UTF-16 not UTF-8

user841550
  • 1,067
  • 3
  • 16
  • 25
  • 1
    Are you sure they're not UTF-16 strings? Storing UTF-8 as `unsigned short` vectors wastes 50% space on typical platforms. – Fred Foo Nov 01 '11 at 15:39
  • Are you sure they're not just *code points*? – Dabbler Nov 01 '11 at 15:48
  • What is the name of the library? ustring to wstring should be no problem as wstring are wchar_ts which should also by short sized. If you wish to preserve the contents you will probably need to convert UTF-16 to UTF-8 when going for strings or chars. If you need to go from char to ustring you need to convert the characters. Take a look at the libaray ICU. – RedX Nov 01 '11 at 15:50
  • @RedX : `wchar_t` is only `short`-sized on Windows; on other platforms, it is typically `int`-sized. – ildjarn Nov 01 '11 at 16:31

2 Answers2

2

libiconv, icu, UTF8-CPP, and others can do this. AFAIK, C++ does not have a portable way to convert between UTF8/16/32. Keep in mind that std::wstring is UTF16 on some systems, and UTF32 on others.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
0
  • Conversion to wide strings: Use iconv().

  • Conversion between UTF8/16/32: Now a built-in feature of C++11, but not widely supported yet. Alternatively, use iconv(). Also use std::u16string and std::u32string as the data type of choice (and std::string for UTF8).

  • Conversion from wide string to system's multibyte narrow string: use wcstombs()/mbstowcs().

Here's my standard response of past posts on the subject: Q1, Q2, Q3.

Community
  • 1
  • 1
Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084