1

What will happen if I omit the u8 prefix for string literals that contain universal character names?

So instead of:

u8"\u00a7some-text"

I write this:

"\u00a7some-text"
一二三
  • 21,059
  • 11
  • 65
  • 74
FrozenHeart
  • 19,844
  • 33
  • 126
  • 242

1 Answers1

3

Without the u8 prefix, the string will be encoded in the execution character set of your platform. The execution character set may be UTF-8 (which is the default on several platforms), but cannot be assumed to always be UTF-8 (see this answer).

If the execution character set cannot encode a universal character name (or any other value in the string literal), the result is implementation-defined (i.e. it might result in an error or some sentinel value). For example, consider the code:

const char* c = "\u00a7";

When compiled using GCC 5.3 with -fexec-charset=ascii, it fails with the error:

error: converting UCN to execution character set: Invalid or incomplete multibyte or wide character

This is because U+00A7 cannot be encoded in ASCII. However, using the u8 prefix:

const char* c = u8"\u00A7";

Compilation succeeds, and c points to the bytes 0xC2 0xA7 0x00.

If you use the u8 prefix, your string is guaranteed to be UTF-8 encoded, regardless of the platform's configuration.

Community
  • 1
  • 1
一二三
  • 21,059
  • 11
  • 65
  • 74