I am trying to iterate through a UTF-8 string. The problem as I understand it is that UTF-8 characters have variable length, so I can't just iterate char-by-char but I have to use some kind of conversion. I am sure there is a function for this in the modern C++ but I don't know what it is.
#include <iostream>
#include <string>
int main()
{
std::string text = u8"řabcdě";
std::cout << text << std::endl; // Prints fine
std::cout << "First letter is: " << text.at(0) << text.at(1) << std::endl; // Again fine. So 'ř' is a 2 byte letter?
for(auto it = text.begin(); it < text.end(); it++)
{
// Obviously wrong. Outputs only ascii part of the text (a, b, c, d) correctly
std::cout << "Iterating: " << *it << std::endl;
}
}
Compiled with clang++ -std=c++11 -stdlib=libc++ test.cpp
From what I've read wchar_t
and wstring
should not be used.