In C11, support for portable wide char types char16_t
and char32_t
are added for UTF-16 and UTF-32 respectively.
However, in the technical report, there is no mention of endianness for these two types.
For example, the following snippet in gcc-4.8.4
on my x86_64 computer when compiled with -std=c11
:
#include <stdio.h>
#include <uchar.h>
char16_t utf16_str[] = u"十六"; // U+5341 U+516D
unsigned char *chars = (unsigned char *) utf16_str;
printf("Bytes: %X %X %X %X\n", chars[0], chars[1], chars[2], chars[3]);
will produce
Bytes: 41 53 6D 51
Which means that it's little-endian.
But is this behaviour platform/implementation dependent: does it always adhere to the platform's endianness or may some implementation choose to always implement char16_t
and char32_t
in big-endian?