The simple question again: having an std::string
, determine which of its characters are digits, symbols, white spaces etc. with respect to the user's language and regional settings (locale).
I managed to split the string into a set of characters using the boost locale boundary analysis tool:
std::string text = u8"生きるか死ぬか";
boost::locale::boundary::segment_index<std::string::const_iterator> characters(
boost::locale::boundary::character,
text.begin(), text.end(),
boost::locale::generator()("ja_JP.UTF-8"));
for (const auto& ch : characters) {
// each 'ch' is a single character in japanese language
}
However, I further do not see any way to determine if ch
is a digit or a symbol or anything else.
There are boost string classification algorithms, but these don't seem to be working with.. whatever *segment_index::iterator
is.
Nor I can apply std::isalpha(std::locale)
, because I'm unsure if it is possible to convert the boost segment into a char
or wchar_t
.
Is there any neat way to classify symbols?