25

I need a good Unicode library for C++. I need:

  1. Transformations in a Unicode sensitive way. For example sort all strings in a case insensitive way and get their first characters for index. Convert various Unicode strings to upper and to lower case. Split text at a reasonable position -- words that would work for Chinese and Japanese as well.
  2. Formatting numbers, dates in locale sensitive way (should be thread safe).
  3. Transparent support of UTF-8 (primary internal representation).

As far as I know the best library is ICU. However, I can't find normal developer friendly API documentation with examples. Also as far as I see, it is not too friendly with modern C++ design, work with STL and so on. Like this:

std::string msg;
unistring umsg.from_utf8(msg);
unistring::word_iterator wi;
for(wi=umsg.words().begin(),n=0;wi!=usmg.words().wi_end(),n<10;++wi,++n) 
  ;
msg=umsg.substr(umsg.words().begin(),wi).to_utf8();
cout<<_("Five 10 words are ")<<msg;

Is there a good STL friendly ICU wrapper released under Open Source license? Preferred is a license permissive like MIT or Boost, but others, like LGPLv2 compatible, are OK as well.

Is there another high quality library similar to ICU?

Platform: Unix/POSIX, Windows support is not required.

Edit: unfortunately I wasn't logged in, so I can't make accept an answer. I have attached the answer by myself.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • 6
    Ooh, +1 for this question. It's baffling imo that a big library like ICU completely fails to follow common C++ idioms. – jalf Feb 04 '09 at 13:50

3 Answers3

21

This question was asked quite a long time before by myself. There was no such library.

So I had written C++ friendly Boost.Locale library that wraps ICU.

Edit Now part of Boost: see Boost.Locale documentation

cxw
  • 16,685
  • 2
  • 45
  • 81
Artyom
  • 31,019
  • 21
  • 127
  • 215
  • This is awesome. Did you make a Boost submission? – Pavel Minaev Nov 09 '09 at 18:48
  • I've started an inital preliminary process - before the official submission. Follow the boost mailing lists – Artyom Nov 09 '09 at 19:24
  • @Artyom: Yep, I'm following boost mailing lists. I really appreciate your effort. However Boost.Locale is far from being a complete wrapper for ICU. I was interested to do some BiDi transformation, and I can't see it in Boost.Locale yet. Is it planned to/should be added there? Can I help? – Yakov Galka Dec 24 '10 at 21:20
  • @ybungalobill actually for bidi I really prefer libfribidi I use it in biditex project rather then ICU itself. Also I must admit, unless you develop very low level applications like your own UI toolkit or some word-processor, then you don't really need BiDi algorithm as most toolkits handle BiDi for you. – Artyom Dec 25 '10 at 13:30
  • @Artyom: I need to convert ISO-8859-8 visual (legacy, not mine choice) to UTF-8 in logical order. I'll look at libfribidi. – Yakov Galka Dec 25 '10 at 13:39
  • @Artyom: I don't see C++ interface in fribidi, can't find a documentation other than not-so-descriptive comments in the code, it's LGPL, *and* I don't see a direct support for reverse-bidi transformation. So, why is it better than ICU? – Yakov Galka Dec 25 '10 at 13:54
  • 1
    @ybungalobill "I need to convert ISO-8859-8 visual (legacy, not mine choice) to UTF-8 in logical order." You can't do this using BiDi algorithm as bidi algorithm is logical->visual not the other way around. Actually the conversion from visual to logical is not well defined and there can be more then one "logical" versions of same visual content. Even simplest add of LRM or RLM between any two code points (depending on text direction) would give you the transformation to UTF-8. – Artyom Dec 25 '10 at 14:18
  • @Artyom: It is well defined if you assume that no directionality marks were used. ICU has support for this. Anyway, you didn't answer my question of your plans regarding Boost.Locale. – Yakov Galka Dec 25 '10 at 14:33
  • @ybungalobill not at this point – Artyom Dec 25 '10 at 15:25
1

The wxWidgets GUI toolkit has some rather nice string classes and unicode support. You don't need to build/use GUI classes if you don't want to. See here for details.

1

Does this fit the bill?

http://www.codeproject.com/KB/string/utf8cpp.aspx

Rob
  • 76,700
  • 56
  • 158
  • 197
  • It seems that provides only a small subset of what is required. It simply allows handling UTF-8 String, but doesn't support toLower/toUpper/formatting numbers/... – Joachim Sauer Feb 04 '09 at 13:51
  • True - it is only for handling utf-8 strings, but it can easily be coupled with Boost String Algorithms. Of course, even then it does not replace ICU. – Nemanja Trifunovic Feb 04 '09 at 20:52