-1

In C++ there's no solid standard when it comes to encoding. If I want to use unicode, for example, UTF-8 in C++ for Windows, how can I achieve that?

  1. On Windows I have to use something like wide-strings to use unicode, is it the only way?
  2. If I have to use third-party libraries, what libraries do you can advise?
  3. What I have to remember when using unicode instead of std::string?
Bolderaysky
  • 268
  • 1
  • 8
  • 1
    One question per Stackoverflow question, please. It seems that to answer all of these requires someone to have extensive background in both Linux and MS-Windows development. Although there are plenty of those, around here, it will far increase your chances of getting useful answers by breaking all of this down into individual questions that focus on one specific domain knowledge, at a time. – Sam Varshavchik Dec 27 '22 at 15:02
  • 2
    A `std::string` can hold UTF-8 strings. – Eljay Dec 27 '22 at 15:05
  • Also see phuclv's answer on UTF-8 and [Windows and MSVC](https://stackoverflow.com/a/63556337/4641116). – Eljay Dec 27 '22 at 15:14
  • @SamVarshavchik thanks! I've edited question to make it more specific and only about one OS. – Bolderaysky Dec 27 '22 at 15:15
  • 1
    It's also unclear what you mean by "use Unicode" here. There are many levels to "using Unicode". Storing a UTF-8 string and handing them off to other code that knows how to deal with them is one thing. Do you intend to do case conversions? Breaking them down into visual glyph sequences for display? Splitting and inserting codepoints between other codepoints? – Nicol Bolas Dec 27 '22 at 15:17
  • "without much pain?" c'mon man, be reasonable! – bolov Dec 27 '22 at 15:50

1 Answers1

0

If you are talking about source code, then its implementation specific for each compiler, but I believe every modern compiler supports UTF-8 at least.

C++ itself has following types to support Unicode: wchar_t, char16_t, char32_t and char8_t for characters and corresponding std::wstring, std::u16string, std::u32string and std::u8string for strings.

And following notations for literals:

char8_t ch_utf8 = u8'c';
char16_t ch_utf16 = u'c';
char32_t ch_utf32 = U'C';
wchar_t ch_wide = L'c';

char8_t str_utf8[] = u8"str";
char16_t str_utf16[] = u"str";
char32_t str_utf32[] = U"str";
wchar_t str_wide[] = L"str";

std::codecvt template for string conversions between different encodings.

sklott
  • 2,634
  • 6
  • 17