What's a small LGPL library for C++ UTF-8/UTF-16 text encoding?

Question

Need something small and portable. Currently looking at UTF8-CPP but thought that I'd reach out for other suggestions too.

Cheers.

What do you need to do? A basic encoding and decoding of UTF8 and UTF16 can be written in a couple of lines. — Kerrek SB, Jul 11 '11 at 17:28
Covered by a couple of previous questions: http://stackoverflow.com/questions/148403/utf8-to-from-wide-char-conversion-in-stl http://stackoverflow.com/questions/2867123/convert-utf-16-to-utf-8-under-windows-and-linux-in-c — Mark Ransom, Jul 11 '11 at 17:40
With a quick barrage of answers and references - UTF8-CPP remains the winner. Its small (just three header files, all inline code) and has an open license that allows modification and redistribution for anyone. — sparkFinder, Jul 11 '11 at 18:17
Does this need to be portable? Most OSes have non-portable charset conversion routines that you could use. — bdonlan, Jul 11 '11 at 19:08

score 2 · Accepted Answer · answered Jul 21 '11 at 20:01

2

UTF-8CPP it is, just want to mark this question as done with. Thanks for all the help guys :)

answered Jul 21 '11 at 20:01

sparkFinder

3,336
10
42
57

score 1 · Answer 2 · answered Jul 11 '11 at 17:38

1

ICU is an "International Components for Unicode" - portable and open source.

Haven't used it myself, so I can't say how good it is, but I know others that do.

answered Jul 11 '11 at 17:38

Eran

21,632
6
56
89

3

It's a good library for most everything, except binary size. Compiled ICU is ~16MB by itself! – Billy ONeal Jul 11 '11 at 17:38
@Billy: Most of that likely comes from the data library. It can be customized to further improve its footprint: http://userguide.icu-project.org/icudata#TOC-Reducing-the-Size-of-ICU-s-Data:-Co – Void - Othman Jul 11 '11 at 18:03
ICU is indeed far too large (maintained by IBM for mammoth projects). – sparkFinder Jul 11 '11 at 18:14
@Void: Yes, I'm not knocking the library too much. But for most of my apps which MUST be under 500k total (I'm supporting Dial-Up users) the size makes it prohibitive. It's an awesome library, just more than I usually need. – Billy ONeal Jul 11 '11 at 18:33
@Billy: Ouch! That's a tough maximum size to deal with these days. I do agree with you. It is quite large; often after the data library footprint optimizations, too. – Void - Othman Jul 13 '11 at 22:00

score 0 · Answer 3 · answered Jul 11 '11 at 19:30

0

Boost? or short source?

answered Jul 11 '11 at 19:30

Naszta

7,560
2
33
49

score -2 · Answer 4 · answered Jul 11 '11 at 17:55

-2

Ironically enough, the best/easiest/most robust way is to wrap Unicode-aware stdlib (C) functions like setlocale(), wprintf() and mbstowcs in your own, application-level) C++ classes. The APIs are portable, and they've been in use for many years.

A few links:

answered Jul 11 '11 at 17:55

paulsm4

114,292
17
138
190

1

-1: `setlocale` and `mbstowcs` are entirely *encoding-agnostic*! Here is a [little rant](http://stackoverflow.com/questions/6300804/wchars-encodings-standards-and-portability) of mine on that subject. – Kerrek SB Jul 11 '11 at 18:03
sparkFinder - If UTF8-CPP works for you, cool! Otherwise I'd strongly urge you to consider wrapping setlocale() and friends. It's apparently not what the Kool Kids do around here - but it works. And it works well :) – paulsm4 Jul 12 '11 at 05:42
@paulsm4: No, it does not work. mbstowcs does not tell you what encoding you are using (and indeed, has no means of changing the encoding). If you are on a platform where you can ensure wcstombs and mbstowcs convert between UTF-8 and UTF-16, good for you. But that's not 99% of platforms. Certainly not Windows or POSIX boxes. – Billy ONeal Jul 14 '11 at 03:59

What's a small LGPL library for C++ UTF-8/UTF-16 text encoding?

4 Answers4