C++17 has deprecated number of convenience functions processing utf. Unfortunately, the last remaining ones will be deprecated in C++20 (*). That being said std::codecvt
is still valid. From C++11 to C++17, you can use a std::codecvt<char32_t, char, mbstate_t>
, starting with C++20 it will be std::codecvt<char32_t, char8_t, mbstate_t>
.
Here is some code converting a code point (up to 0x10FFFF) in utf8:
// codepoint is the codepoint to convert
// buff is a char array of size sz (should be at least 4 to convert any code point)
// on return sz is the used size of buf for the utf8 converted string
// the return value is the return value of std::codecvt::out (0 for ok)
std::codecvt_base::result to_utf8(char32_t codepoint, char *buf, size_t& sz) {
std::locale loc("");
const std::codecvt<char32_t, char, std::mbstate_t> &cvt =
std::use_facet<std::codecvt<char32_t, char, std::mbstate_t>>(loc);
std::mbstate_t state{{0}};
const char32_t * last_in;
char *last_out;
std::codecvt_base::result res = cvt.out(state, &codepoint, 1+&codepoint, last_in,
buf, buf+sz, last_out);
sz = last_out - buf;
return res;
}
(*) std::codecvt
will still exist in C++20. Simply the default instantiations will no longer be std::codecvt<char16_t, char, std::mbstate_t>
and std::codecvt<char32_t, char, std::mbstate_t>
but std::codecvt<char16_t, char8_t, std::mbstate_t>
and std::codecvt<char32_t, char8_t, std::mbstate_t>
(note char8_t
instead of char
)