I have been exploring C++11's new Unicode functionality, and while other C++11 encoding questions have been very helpful, I have a question about the following code snippet from cppreference. The code writes and then immediately reads a text file saved with UTF-8 encoding.
// Write
std::ofstream("text.txt") << u8"z\u6c34\U0001d10b";
// Read
std::wifstream file1("text.txt");
file1.imbue(std::locale("en_US.UTF8"));
std::cout << "Normal read from file (using default UTF-8/UTF-32 codecvt)\n";
for(wchar_t c; file1 >> c; ) // ?
std::cout << std::hex << std::showbase << c << '\n';
My question is quite simply, why is a wchar_t
needed in the for
loop? A u8
string literal can be declared using a simple char *
and the bit layout of the UTF-8 encoding should tell the system the character's width. It appears there is some automatic conversion from UTF-8 to UTF-32 (hence the wchar_t
), but if this is the case, why is the conversion necessary?