Consider a C++11 compiler that has an execution character set of UTF-8 (and is compliant with the x86-64 ABI which requires the char
type be a signed 8-bit byte).
The letter Ä (umlaut) has unicode code point of 0xC4
, and has a 2 code unit UTF-8 representation of {0xC3, 0x84}
The compiler assigns the character literal '\xC4'
a type of int
with a value of 0xC4
.
Is the compiler standard-compliant and ABI-compliant? What is your reasoning?
Relevant quotes from C++11 standard:
2.14.3.1
An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal has type int and implementation-defined value.
2.14.3.4
The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char