1

I have some server-side C# code that serializes (among other things) some Unicode strings (using UTF8 encoding).

On the client side, I would like to deserialize all these strings. I was able to deserialize them and store them as wstrings.

But then I've heard that Unicode strings can be stored in regular strings as well. I've also read that wstrings are not portable and should therefore be avoided. So I am wondering what are the benefits/drawbacks of using string vs. wstring in my situation.

Also, I still don't understand how is it possible to store Unicode strings inside of regular string variables. That sounds strange knowing that string is vector of chars. How can an arbitrary character be stored inside of a char (8-bit type)? Would string::length() return the number of characters or bytes? What about string::size()? What would the indexing operator return?

mk33
  • 351
  • 1
  • 2
  • 6
  • Why not spend some time googling around "UTF-8" and see what you learn? – Dúthomhas Nov 16 '15 at 02:48
  • UTF-8 encoding is identical to ASCII for codepoints between 0 and 127; higher codepoints are encoded in multiple bytes. So non-ASCII characters will encode as 2 or 3 bytes. String length counts non-zero bytes, not number of characters. – MarkU Nov 16 '15 at 02:58
  • May be worth searching [tag:unicode] [tag:c++] std::string: http://stackoverflow.com/search?q=%5Bunicode%5D+%5Bc%2B%2B%5D+std%3A%3Astring+ – MarkU Nov 16 '15 at 03:30
  • Sad but true: Unicode support sucks in almost every language. – user253751 Nov 16 '15 at 03:40
  • 1
    if you have `utf-8` you can just use `std::string`, if of course you not need access to specific character by index – fghj Nov 16 '15 at 05:11
  • Ok, it looks like I can use a std::string, but none of its methods such as: length(), substr(), indexing operator, etc... will return desired result, so I don't get much by using std::string other than having a container to store those bytes. I guess, wstring is better option then, since at least it provides access and operations on individual characters. – mk33 Nov 16 '15 at 06:28
  • We have an excellent question on this topic already. If there's anything missing in there, (and assuming it's not answered in another C++/Unicode question either), feel free to ask a new question. If you do, do reference the existing questions you already checked. – MSalters Nov 16 '15 at 09:26

0 Answers0