How to convert Windows-1251(ISO-88-59-5) string to UTF-8 string on Linux?

Question

I have a common string, which is encoded like ISO-88-59-5 and I want to transform this string to UTF-8 format, by the way, I have the code example on C# which is working well. I need to do the same on C++

result = mainString.Substring(nameStart + 3, symbols);
Encoding enc = Encoding.GetEncoding("ISO-8859-5");
byte[] bytes = enc.GetBytes(result);
                
result = Encoding.UTF8.GetString(bytes);

result is a string with text

I have tried to use the first method in answers. The string has changed, but it is a mess so far https://stackoverflow.com/questions/4059775/convert-iso-8859-1-strings-to-utf-8-in-c-c — XoDefender, Jul 25 '22 at 12:25
Now I am reading about std::mbstowcs, people say, that it should help somehow — XoDefender, Jul 25 '22 at 12:27
C++ does not come with any standard library feature that is guaranteed to be able to do such a translation. In general you need to use a library like e.g. iconv. — user17732522, Jul 25 '22 at 12:28
And I don't know any C#, but that code doesn't look right to me. If you want to encode a string in a different encoding, you should already have in the form of encoded bytes to start with, not a decoded string. Why do you encode in one encoding and then decode in another? That should produce gibberish, shouldn't it? — user17732522, Jul 25 '22 at 12:32
It seems to me that here I just decode the string in ISO format to a sequence of bytes and the parse this sequence according to UTF-8 encoding to make the right string. Probably I am not right, but I understand it this way — XoDefender, Jul 25 '22 at 12:45
C++ support for encoding leaves a lot of room for improvement. C++ tends to defer to operating system handling of encoding and conversions, and even then it can be quite the challenge (moreso if platform portability is a concern). There are third party libraries to fulfill the need, such as [ICU](https://icu.unicode.org/design/cpp). — Eljay, Jul 25 '22 at 12:49
@XoDefender a C# `string` is UTF-16 in memory. *Encoding* a `string` to ISO bytes and then *decoding* those bytes as UTF-8 back to a `string` is definitely wrong. See [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/) — Remy Lebeau, Jul 25 '22 at 15:17

score 1 · Answer 1 · answered Jul 25 '22 at 12:31

1

The procedure to do this on Linux is as follows:

Use iconv_open() as described in its manual page to create a handle for a conversion from windows-1251 to UTF-8. I just double-checked and "windows-1251" is supported by the iconv library.
Use iconv() as described in its manual page.
Use iconv_close() as described in its manual page.

answered Jul 25 '22 at 12:31

Sam Varshavchik

114,536
5
94
148

FYI, link-only answers are generally frowned upon. Links can break over time. SO questions/answers are expected to be self-contained. Please update your answer to include the relevant information directly - documentation quotes, code examples, etc. – Remy Lebeau Jul 25 '22 at 15:22
https://stackoverflow.com/questions/73112486/how-to-correctly-encode-the-windows-1251-string-to-utf-8-format-linux - the new question with the code I tried to write. The problem is that the initial string encodes, but the symbols are wrong otherwise – XoDefender Jul 25 '22 at 16:27
@XoDefender why did you post a new question instead of updating this question? – Remy Lebeau Jul 25 '22 at 19:26

How to convert Windows-1251(ISO-88-59-5) string to UTF-8 string on Linux?

1 Answers1