Omitting u8 prefix for string literals that contain universal character names

Question

What will happen if I omit the u8 prefix for string literals that contain universal character names?

So instead of:

u8"\u00a7some-text"

I write this:

"\u00a7some-text"

score 3 · Accepted Answer · edited May 23 '17 at 11:45

Without the u8 prefix, the string will be encoded in the execution character set of your platform. The execution character set may be UTF-8 (which is the default on several platforms), but cannot be assumed to always be UTF-8 (see this answer).

If the execution character set cannot encode a universal character name (or any other value in the string literal), the result is implementation-defined (i.e. it might result in an error or some sentinel value). For example, consider the code:

const char* c = "\u00a7";

When compiled using GCC 5.3 with -fexec-charset=ascii, it fails with the error:

error: converting UCN to execution character set: Invalid or incomplete multibyte or wide character

This is because U+00A7 cannot be encoded in ASCII. However, using the u8 prefix:

const char* c = u8"\u00A7";

Compilation succeeds, and c points to the bytes 0xC2 0xA7 0x00.

If you use the u8 prefix, your string is guaranteed to be UTF-8 encoded, regardless of the platform's configuration.

Omitting u8 prefix for string literals that contain universal character names

1 Answers1