Length in the Unicode world is always fun... What Length do you need? For example:
string str = "";
// Length in UTF-16 code units
int len = str.Length; // 2
// Length in bytes, if encoded in UTF16, as done by .NET
int len2 = str.Length * 2; // 4
// Length in bytes, if encoded in UTF8
int len3 = Encoding.UTF8.GetByteCount(str); // 4
// Length in unicode code points
int len4 = Encoding.UTF32.GetByteCount(str) / 4; // 1
Note that there is a fifth length: Length in number of grapheme cluster, that is even more complex to calculate, because some codepoints can "merge" together, and a sixth: Length in number of Glyphs.
Now, your string has len
equal to 9
, len2
equal to 18
, len3
(so the length in bytes if converted to UTF8) equal to 13
, len4
equal to 9.
Nearly all the chinese characters are in the Basic Multilingual Plane of the Unicode standard, so they have a length of 1 UTF-16 code unit, and they are mappable to 2 or 3 bytes in UTF8.
Some interesting reference: What's the difference between a character, a code point, a glyph and a grapheme?
.
Ah... and please forget about the Encoding.ASCII
. Live like it doesn't exist. It probably isn't what you think it is. Even if you lived in the old MS DOS world with its funny characters, that wasn't ASCII.