Windows CE / UTF-16 / Chinese

Question

I've read that Windows CE uses the "UTF-16 version of UNICODE" (i'm a newbie with encodings).

What happens when a string contains a character that requires more that 2 bytes, like chinese characters ? Does it take 3 ? If i have a string containing chinese characters, accessing the N-th couple of bytes will not necessaily access the N-th visible symbol ?

Also what about performance ? If i understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right ? If yes is it also true for UTF-16 ?

Thank you.

See 1) [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](http://www.joelonsoftware.com/articles/Unicode.html), 2) [What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text](http://kunststube.net/encoding/), and 3) [UTF-16](https://en.wikipedia.org/wiki/UTF-16). — Remy Lebeau, Mar 02 '15 at 02:36

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

What happens when a string contains a character that requires more that 2 bytes, like Chinese characters? Does it take 3?

No, four.

Wikipedia: UTF-16:

In UTF-16, code points greater or equal to 2¹⁶ are encoded using two 16-bit code units.

If I understand well, encodings that have a variable number of bytes per visible symbol require the string to be scanned from the beginning to access the N-th visible symbol right?

Yes. See for example Why use multibyte string functions in PHP?.

Windows CE / UTF-16 / Chinese

1 Answers1