Let's assume there's only C99 Standard paper and printf
library function needs to be implemented according to this standard to work with UTF-16 encoding, could you please clarify the expected behavior for s
conversion with precision specified?
C99 Standard (7.19.6.1) for s
conversion says:
If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type. Characters from the array are written up to (but not including) the terminating null character. If the precision is specified, no more than that many bytes are written. If the precision is not specified or is greater than the size of the array, the array shall contain a null character.
If an l length modifier is present, the argument shall be a pointer to the initial element of an array of wchar_t type. Wide characters from the array are converted to multibyte characters (each as if by a call to the wcrtomb function, with the conversion state described by an mbstate_t object initialized to zero before the first wide character is converted) up to and including a terminating null wide character. The resulting multibyte characters are written up to (but not including) the terminating null character (byte). If no precision is specified, the array shall contain a null wide character. If a precision is specified, no more than that many bytes are written (including shift sequences, if any), and the array shall contain a null wide character if, to equal the multibyte character sequence length given by the precision, the function would need to access a wide character one past the end of the array. In no case is a partial multibyte character written.
I don't quite understand this paragraph in general and the statement "If a precision is specified, no more than that many bytes are written" in particular.
For example, let's take UTF-16 string "TEST" (byte sequence: 0x54, 0x00, 0x45, 0x00, 0x53, 0x00, 0x54, 0x00).
What is expected to be written to the output buffer in the following cases:
- If precision is 3
- If precision is 9 (one byte more than string length)
- If precision is 12 (several bytes more than string length)
Then there's also "Wide characters from the array are converted to multibyte characters". Does it mean UTF-16 should be converted to UTF-8 first? This is pretty strange in case I expect to work with UTF-16 only.