1

I have a common string, which is encoded like ISO-88-59-5 and I want to transform this string to UTF-8 format, by the way, I have the code example on C# which is working well. I need to do the same on C++

result = mainString.Substring(nameStart + 3, symbols);
Encoding enc = Encoding.GetEncoding("ISO-8859-5");
byte[] bytes = enc.GetBytes(result);
                
result = Encoding.UTF8.GetString(bytes);

result is a string with text

gunr2171
  • 16,104
  • 25
  • 61
  • 88
XoDefender
  • 33
  • 6
  • What have you attempted so far to solve the issue in C++? – gunr2171 Jul 25 '22 at 12:20
  • I have tried to use the first method in answers. The string has changed, but it is a mess so far https://stackoverflow.com/questions/4059775/convert-iso-8859-1-strings-to-utf-8-in-c-c – XoDefender Jul 25 '22 at 12:25
  • Now I am reading about std::mbstowcs, people say, that it should help somehow – XoDefender Jul 25 '22 at 12:27
  • C++ does not come with any standard library feature that is guaranteed to be able to do such a translation. In general you need to use a library like e.g. iconv. – user17732522 Jul 25 '22 at 12:28
  • 2
    And I don't know any C#, but that code doesn't look right to me. If you want to encode a string in a different encoding, you should already have in the form of encoded bytes to start with, not a decoded string. Why do you encode in one encoding and then decode in another? That should produce gibberish, shouldn't it? – user17732522 Jul 25 '22 at 12:32
  • It seems to me that here I just decode the string in ISO format to a sequence of bytes and the parse this sequence according to UTF-8 encoding to make the right string. Probably I am not right, but I understand it this way – XoDefender Jul 25 '22 at 12:45
  • C++ support for encoding leaves a lot of room for improvement. C++ tends to defer to operating system handling of encoding and conversions, and even then it can be quite the challenge (moreso if platform portability is a concern). There are third party libraries to fulfill the need, such as [ICU](https://icu.unicode.org/design/cpp). – Eljay Jul 25 '22 at 12:49
  • @XoDefender a C# `string` is UTF-16 in memory. *Encoding* a `string` to ISO bytes and then *decoding* those bytes as UTF-8 back to a `string` is definitely wrong. See [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) – Remy Lebeau Jul 25 '22 at 15:17

1 Answers1

1

The procedure to do this on Linux is as follows:

  1. Use iconv_open() as described in its manual page to create a handle for a conversion from windows-1251 to UTF-8. I just double-checked and "windows-1251" is supported by the iconv library.

  2. Use iconv() as described in its manual page.

  3. Use iconv_close() as described in its manual page.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • FYI, link-only answers are generally frowned upon. Links can break over time. SO questions/answers are expected to be self-contained. Please update your answer to include the relevant information directly - documentation quotes, code examples, etc. – Remy Lebeau Jul 25 '22 at 15:22
  • https://stackoverflow.com/questions/73112486/how-to-correctly-encode-the-windows-1251-string-to-utf-8-format-linux - the new question with the code I tried to write. The problem is that the initial string encodes, but the symbols are wrong otherwise – XoDefender Jul 25 '22 at 16:27
  • @XoDefender why did you post a new question instead of updating this question? – Remy Lebeau Jul 25 '22 at 19:26