0

I need to convert incoming strings in UTF-8 encoding into country-specific code pages - e.g. ISO-8859-2 (ISO Latin-2).

The important things is, I want to be independent from the presence of proper locales on the system. The goal of that conversion is not internationalization, in the sense that my program should have proper output on multilingual users' machines. The conversion has to create data for external devices which need predefined encodings.

So far, I just created a map which defines conversions from Unicode codepoints to ISO-8859-2 equivalents. I use std::wstring_convert<std::codecvt_utf8<wchar_t>> to convert UTF-8 std::string into Unicode std::wstring, and then I make conversions using the defined map. Of course, I suppose there are better ways.

Are there any solutions available in standard C++ libraries, Boost, or others, making it possible to perform such conversions? Is it possible to "link" a locale setting such as charset to the application, so that it can work independently from system locales?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Morpheus
  • 119
  • 1
  • 11
  • Transcoding should not depend on locale. You may want to check Wikipedia: you have all characters, for all ISO859-X, and so you can do the conversion manually. it is very simple. Just: normalize unicode string before doing the look-up (NFC) – Giacomo Catenazzi Aug 04 '20 at 09:59
  • 3
    You may have interesting in [libiconv](https://www.gnu.org/software/libiconv/). – grizzlybears Aug 04 '20 at 10:29
  • Note that `std::codecvt_utf8` is deprecated since C++17 with no replacement. See here for alternatives: [Portable and simple unicode string library for C/C++](https://stackoverflow.com/questions/433301/portable-and-simple-unicode-string-library-for-c-c) – rustyx Aug 04 '20 at 11:09

1 Answers1

1

You might like to take a look at International Components for Unicode (ICU), which has character conversion functions.

Graham Asher
  • 1,648
  • 1
  • 24
  • 34