What is the encoding of unprefixed string literals in C++? For example, all string literals are parsed and stored as UTF-16 in Java, as UTF-8 in Python3. I guess this is the case with C++ u8""
literals. But I'm not clear about normal literals like ""
.
What should be the output of following code?
#include <iostream>
#include <iomanip>
int main() {
auto c = "Hello, World!";
while(*c) {
std::cout << std::hex << (unsigned int){*c++} << " ";
}
}
When I run this in my machine, it gives following output:
48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 21
But is this guarantied? Cppreference page for string literals says that characters inside normal string literals are from the translation character set, and translation character set states that:
The translation character set consists of the following elements:
- each character named by ISO/IEC 10646, as identified by its unique UCS scalar value, and
- a distinct character for each UCS scalar value where no named character is assigned.
From this definition, it seems translation character set refers to Unicode (or its superset). Then is there no difference between ""
and u8""
except for explicitness?
Suppose if I want my string to be in EBCDIC encoding (just as an exercise), what is the correct way to achieve it in C++?
EDIT: The linked Cppreference page for string literals does say that it is implementation defined. Does that mean, should I avoid using them?