19

I'm looking for a portable and easy-to-use string library for C/C++, which helps me to work with Unicode input/output. In the best case, it will store its strings in memory in UTF-8, and allow me to convert strings from ASCII to UTF-8/UTF-16 and back. I don't need much more besides that (ok, a liberal license won't hurt). I have seen that C++ comes with a <locale> header, but this seems to work on wchar_t only, which may or may not be UTF-16 encoded, plus I'm not sure how good this is actually.

Uses cases are for example: On Windows, the unicode APIs expect UTF-16 strings, and I need to convert ASCII or UTF-8 strings to pass it on to the API. Same goes for XML parsing, which may come with UTF-16, but I actually only want to process internally with UTF-8 (or, for that matter, if I switch internally to UTF-16, I'll need a conversion to that anyway).

So far, I've taken a look at the ICU, which is quite huge. Moreover, it wants to be built using it own project files, while I'd prefer a library for which there is either a CMake project or which is easy to build (something like compile all these .c files, link and good to go), instead of shipping something large as the ICU along my application.

Do you know such a library, which is also being maintained? After all, this seems to be a pretty basic problem.

Anteru
  • 19,042
  • 12
  • 77
  • 121

3 Answers3

22

UTF8-CPP seems to be exactly what you want.

Nemanja Trifunovic
  • 24,346
  • 3
  • 50
  • 88
  • Any idea how good that is? I've just taken a look at it, seems to be really simple, but I'd like to hear some opinions on it. – Anteru Jan 11 '09 at 18:18
  • 4
    Well, you won't hear any impartial opinions from me because I am the author :) However, I haven't had any open bugs for more than a year, and the people are actually using it (250-300 downloads a month) so I believe it is not that bad :) – Nemanja Trifunovic Jan 12 '09 at 01:20
  • 1
    +1 for UTF8-CPP. I use it everywhere where I must deal with UTF8 strings in my C++ code (and sometimes utf16).Very easy to use, and very nice C++'ish API. – Mārtiņš Možeiko Jan 21 '12 at 08:39
  • This is extremly annoying to look for which licence a software is using during hours. That's the kind of thing that should be mentionnened in a place that coders directly look at so that they don't have to waste time on this. If the library is coded in the same spirit, it doesn't make want to try it... – Virus721 Sep 07 '13 at 12:52
  • Hello, I'd like to know if your library is available at GitHub? thanks. – Xam Feb 14 '18 at 03:02
  • 1
    @Xam. Yes and the link is literally the first word in my answer. – Nemanja Trifunovic Feb 14 '18 at 13:30
3

I'd recommend that you look at the GNU iconv library.

Alnitak
  • 334,560
  • 70
  • 407
  • 495
  • 1
    iconv only gives you the ability to convert between different encodings. You don't get things like len() functions, convert case, etc. – Steve Folly Jan 28 '09 at 09:52
0

There is another portable C library for string conversion between UTF-8, UTF-16, UTF-32, wchar - mdz_unicode library.

maxdz
  • 11
  • 1
  • 2