2

Section 5.2.1.2.1 of the C11 standard states:

A byte with all bits zero shall be interpreted as a null character independent of shift state. Such a byte shall not occur as part of any other multibyte character.

Which, I think, makes it clear that a multibyte character can not have any of its bytes zero. If so, the below example should not work (because the characters contain zero bytes); but it does.

char16_t wc= u'\u1200';
char16_t wcs[] = u'\u1200\u1300';
printf("%#x, %#x, %#x\n", wc, wcs[0], wcs[1]);           

Why? What am I missing here?

cmutex
  • 1,478
  • 11
  • 24
  • 2
    It looks to me like you have a single int there, not a multibyte char. Multibyte char probably refers to stuff like UTF-8 and Shift-JIS. – ikegami Jan 01 '20 at 06:53
  • 2
    Wide characters are different from multi-byte characters. Wide characters may have null bytes but only one null (zero) wide character. – Jonathan Leffler Jan 01 '20 at 06:55

1 Answers1

2

I think you're misunderstanding the term multibyte character (which is, possibly, an ambiguous term). For example, from this page:

The term “multibyte character” is defined by ISO C to denote a byte sequence that encodes an ideogram, no matter what encoding scheme is employed. All multibyte characters are members of the “extended character set.” A regular single-byte character is just a special case of a multibyte character. The only requirement placed on the encoding is that no multibyte character can use a null character as part of its encoding.

Thus, the condition you quoted refers to character strings that are arrays of single byte elements, but which can contain characters that require more than one of those single-char elements for their representation.

The char16_t type you use is a wide character, in which all representable characters are encoded as 2-byte values - even 'simple' stuff like an ASCII 'A', which would be 0x0041.

EDIT: I realize that what I have written above is confusing! However, while searching for a basis for some clarification, I came across this Stack Overflow post: What is a multibyte character set? I can't really improve on the answers given there, so maybe it could be used as a 'duplicate'.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • thanks, can you please explain/simplify that text you just quoted? I read it and it's really confusing. – cmutex Jan 01 '20 at 06:59
  • @user3124390 see edit and the SO link given there. (As I've already posted an answer, I'll leave it for others to VTC this as a duplicate, should they see fit to do so!) – Adrian Mole Jan 01 '20 at 07:12
  • 1
    See C11 [§3 Terms, definitions, and symbols](https://port70.net/~nsz/c/c11/n1570.html#3.7) on 'character', 'multibyte character' and 'wide character'. Though, in all honesty, those definitions are not dreadfully helpful. – Jonathan Leffler Jan 01 '20 at 07:32