3

Need something small and portable. Currently looking at UTF8-CPP but thought that I'd reach out for other suggestions too.

Cheers.

sparkFinder
  • 3,336
  • 10
  • 42
  • 57
  • 2
    What do you need to do? A basic encoding and decoding of UTF8 and UTF16 can be written in a couple of lines. – Kerrek SB Jul 11 '11 at 17:28
  • 2
    Covered by a couple of previous questions: http://stackoverflow.com/questions/148403/utf8-to-from-wide-char-conversion-in-stl http://stackoverflow.com/questions/2867123/convert-utf-16-to-utf-8-under-windows-and-linux-in-c – Mark Ransom Jul 11 '11 at 17:40
  • With a quick barrage of answers and references - UTF8-CPP remains the winner. Its small (just three header files, all inline code) and has an open license that allows modification and redistribution for anyone. – sparkFinder Jul 11 '11 at 18:17
  • Does this need to be portable? Most OSes have non-portable charset conversion routines that you could use. – bdonlan Jul 11 '11 at 19:08

4 Answers4

2

UTF-8CPP it is, just want to mark this question as done with. Thanks for all the help guys :)

sparkFinder
  • 3,336
  • 10
  • 42
  • 57
1

ICU is an "International Components for Unicode" - portable and open source.

Haven't used it myself, so I can't say how good it is, but I know others that do.

Eran
  • 21,632
  • 6
  • 56
  • 89
  • 3
    It's a good library for most everything, except binary size. Compiled ICU is ~16MB by itself! – Billy ONeal Jul 11 '11 at 17:38
  • @Billy: Most of that likely comes from the data library. It can be customized to further improve its footprint: http://userguide.icu-project.org/icudata#TOC-Reducing-the-Size-of-ICU-s-Data:-Co – Void - Othman Jul 11 '11 at 18:03
  • ICU is indeed far too large (maintained by IBM for mammoth projects). – sparkFinder Jul 11 '11 at 18:14
  • @Void: Yes, I'm not knocking the library too much. But for most of my apps which MUST be under 500k total (I'm supporting Dial-Up users) the size makes it prohibitive. It's an awesome library, just more than I usually need. – Billy ONeal Jul 11 '11 at 18:33
  • @Billy: Ouch! That's a tough maximum size to deal with these days. I do agree with you. It is quite large; often after the data library footprint optimizations, too. – Void - Othman Jul 13 '11 at 22:00
0

Boost? or short source?

Naszta
  • 7,560
  • 2
  • 33
  • 49
-2

Ironically enough, the best/easiest/most robust way is to wrap Unicode-aware stdlib (C) functions like setlocale(), wprintf() and mbstowcs in your own, application-level) C++ classes. The APIs are portable, and they've been in use for many years.

A few links:

paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • 1
    -1: `setlocale` and `mbstowcs` are entirely *encoding-agnostic*! Here is a [little rant](http://stackoverflow.com/questions/6300804/wchars-encodings-standards-and-portability) of mine on that subject. – Kerrek SB Jul 11 '11 at 18:03
  • sparkFinder - If UTF8-CPP works for you, cool! Otherwise I'd strongly urge you to consider wrapping setlocale() and friends. It's apparently not what the Kool Kids do around here - but it works. And it works well :) – paulsm4 Jul 12 '11 at 05:42
  • @paulsm4: No, it does not work. mbstowcs does not tell you what encoding you are using (and indeed, has no means of changing the encoding). If you are on a platform where you can ensure wcstombs and mbstowcs convert between UTF-8 and UTF-16, good for you. But that's not 99% of platforms. Certainly not Windows or POSIX boxes. – Billy ONeal Jul 14 '11 at 03:59