0

Is there an easy STL way to convert a std::string to a std::u32string, i.e. a basic_string of char to char32_t?

This is not a Unicode question.

user3443139
  • 654
  • 1
  • 7
  • 11

2 Answers2

1

To initialise a new string:

std::u32string s32(s.begin(), s.end());

To assign to an existing string:

s32.assign(s.begin(), s.end());

If the string might contain characters outside the supported range of char, then this might cause sign-extension issues, converting negative values into large positive values. Dealing with that possibility is messier; you'll have to convert to unsigned char before widening the value.

s32.resize(s.size());
std::transform(s.begin(), s.end(), s32.begin(), 
               [](char c) -> unsigned char {return c;});

or a plain loop

s32.clear();  // if not already empty
for (unsigned char c : s) {s32 += c;}
Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • How do I stop the sign-extension in this conversion? 0x80 to 0xff are being converted to 0x80 0xff 0xff 0xff to 0xff 0xff 0xff 0xff – user3443139 Nov 11 '14 at 13:31
  • @user3443139: You don't. If the string might contain values outside the well-defined range of `char`, then you'll need to do something more complicated. – Mike Seymour Nov 11 '14 at 13:35
  • @user3443139: I've added some suggestions to deal with unsupported characters. – Mike Seymour Nov 11 '14 at 13:41
  • Thanks. Another possibility is that I switch to using std::basic_string rather than std::string – user3443139 Nov 11 '14 at 13:55
  • @user3443139: You can try, but it's unsupported. The standard only specifies specialisations of `char_traits` for `char`, `char16_t`, `char32_t` and `wchar_t`. Your library may or may not have a usable specialisation for `unsigned char`, which may or may not do the right thing. Portably, you'd have to write your own traits class, which is probably more hassle than it's worth. – Mike Seymour Nov 11 '14 at 14:09
-1
s32.resize(s.length());
std::copy(s.begin(),s.end(),s32.begin());
Qantas 94 Heavy
  • 15,750
  • 31
  • 68
  • 83
Photon
  • 3,182
  • 1
  • 15
  • 16
  • Just a note: it looks better if you use code blocks instead of inline code spans. You can do so by indenting your code by 4 spaces (or highlight all your code and press Ctrl + K). – Qantas 94 Heavy Nov 11 '14 at 11:22