P1423 (char8_t backward compatibility remediation) documents a number of approaches that can be used to remediate the backward compatibility impact due to the adoption of char8_t
via P0482 (char8_t: A type for UTF-8 characters and strings).
Because char8_t
is a non-aliasing type, it is undefined behavior to use reinterpret_cast
to, for example, assign a char8_t
pointer to a pointer to char
as in reinterpret_cast<const char8_t*>(data.c_str())
. However, because char
and unsigned char
are allowed to alias any type, it is permissible to use reinterpret_cast
in the other direction, e.g., reinterpret_cast<const char*>(u8"text")
.
None of the remediation approaches documented in P1423 are silver bullets. You'll need to evaluate what works best for your use cases. You might also appreciate the answers in C++20 with u8, char8_t and std::string.
With regard to char8_t
not being a UTF-8 character and u8string
not being a UTF-8 string, that is correct in that, char8_t
is a code unit type (not a code point type) and that u8string
does not enforce well-formed UTF-8 sequences. However, the intent is very much that these types only be used for UTF-8 data.