3

I seem to be having some trouble wrapping my head around something. I am trying to create a C++ function to convert UTF8 to Wide. I started googling and found Boost, and ICU (both of which look way too large). Then I found the utf-cpp header library and that looked good. I found that via some thread on here.

Then I read that thread and found https://stackoverflow.com/a/6155524

But how does those two functions turn a UTF32 string into Wide char? It just seems to be UTF32 to UTF8. I could not find any mention of Wide character on the utf-cpp header documentation...

Anyways is there any sort of library to convert UTF8/16/32 to Wide and reverse? I was looking at http://src.chromium.org/svn/trunk/src/base/utf_string_conversions.cc which seems to use ICU, but it also has like 18 header files.

Any help? Maybe it's just my broken head today.

Edit: After rereading this it is two questions... really what I want to know is there a nice smallish library (like utf-cpp header) to handle wide characters & unicode.

Community
  • 1
  • 1
Steven
  • 13,250
  • 33
  • 95
  • 147
  • What is a "wide char"? – Nicol Bolas Apr 17 '13 at 02:06
  • http://en.wikipedia.org/wiki/Wide_character but then again I probably know about as much as you. All I know is the library I am dealing with requires input in Wide (wstring), yet most of the app is either UTF8 or UTF16. So I need to do some conversion. – Steven Apr 17 '13 at 02:07
  • What library are you using that requires "wide strings"? – Nicol Bolas Apr 17 '13 at 02:20
  • My own answer is here: http://stackoverflow.com/a/148766/5987 some other good stuff in that thread too. – Mark Ransom Apr 17 '13 at 02:35
  • 1
    P.S. If you're using Windows as you mention in one of your comments, that's UTF-16 and not UCS32. – Mark Ransom Apr 17 '13 at 03:05
  • utf8everywhere.org recommends boost::nowide's boost::narrow(). It's a small header-only portable library by Artyom, still not in the release AFAIK but you can use it. – Pavel Radzivilovsky Apr 24 '13 at 21:21

1 Answers1

6

If by "wide char", you are referring to wchar_t, then you have to take into account that it is 16-bit (using UCS-2 or UTF-16) on some platforms, but is 32-bit (using UTF-32) on other platforms. So asking how to convert to/from "wide char", you first have to define what "wide char" actually means. Proper 16-bit/32-bit data types need to be used when dealing with UTF-16/32.

Pretty much any Unicode library, including utf8-cpp and ICU, has functions for converting between UTF8<->UTF16 and UTF8<->UTF32 using appropriate data types and not relying on wchar_t.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • I am dealing with strictly Windows. Linux and Mac are dealt with separately. – Steven Apr 17 '13 at 02:22
  • 1
    On Windows, you don't need a library, unless you are writing portable code. Windows uses UTF-16, and `wchar_t` is 2 bytes. You can use the Win32 API `MultiByteToWideChar()` and `WideCharToMultiByte()` functions to convert between UTF-8 and UTF-16. – Remy Lebeau Apr 17 '13 at 03:46