4

Up until now I have been using std::string in my C++ applications for embedded system (routers, switches, telco gear, etc.).

For the next project, I am considering to switch from std::string to std::wstring for Unicode support. This would, for example, allow end-users to use Chinese characters in the command line interface (CLI).

What complications / headaches / surprises should I expect? What, for example, if I use a third-party library which still uses std::string?

Since support for international strings isn't that strong of a requirement for the type of embedded systems that I work on, I would only do it if it isn't going to cause major headaches.

Benoit
  • 37,894
  • 24
  • 81
  • 116
Bruno Rijsman
  • 3,715
  • 4
  • 31
  • 61

3 Answers3

1

Note that many communications protocols require 8-bit characters (or 7-bit characters, or other varieties), so you will often need to translate between your internal wchar_t/wstring data and external encodings.

UTF-8 encoding is useful when you need to have an 8-bit representation of Unicode characters. (See How Do You Write Code That Is Safe for UTF-8? for some more info.) But note that you may need to support other encodings.

More and more third-party libraries are supporting Unicode, but there are still plenty that don't.

I can't really tell you whether it is worth the headaches. It depends on what your requirements are. If you are starting from scratch, then it will be easier to start with std::wstring than converting from std::string to std::wstring later.

Community
  • 1
  • 1
Kristopher Johnson
  • 81,409
  • 55
  • 245
  • 302
  • Right. You can use string for UTF-8, and English will be represented exactly in the same way as in ASCII. – Lev Oct 02 '08 at 19:04
1

std::wstring is a good choice for holding Unicode strings on Windows, but not on most other platforms, and ceirtanly not for a portable code. Better try to stick with std::string and UTF-8.

Nemanja Trifunovic
  • 24,346
  • 3
  • 50
  • 88
  • Really? Could you elaborate? I thought the STL library was very portable. -- Cayle. – Bruno Rijsman Oct 02 '08 at 19:49
  • 1
    STL is portable, but C++ in general is Unicode-agnostic at this point and only on certain platforms you can assume that wstring contains UTF-16 encoded strings (Windows). On other platforms it may be UTF32 (Linux) or even dependent on environment settings (Solaris). – Nemanja Trifunovic Oct 02 '08 at 20:07
1

You might get some headache because of the fact that the C++ standard dictates that wide-streams are required to convert double-byte characters to single-byte when writing to a file, and how this conversion is done is implementation-dependent.

Reunanen
  • 7,921
  • 2
  • 35
  • 57
  • This is with the default local. This behavior can be modified by setting an appropriate property in the local associated with the stream. – Martin York Oct 04 '08 at 12:12
  • +1 Quoted answer in related question: http://stackoverflow.com/questions/390977/how-to-readstore-unicode-with-stl-strings-and-streams#391052 – stukelly Dec 24 '08 at 09:03