This can be done using the Standard Library, but it's not the most obvious or easy functionality. It is further complicated by the fact that the Standard Library has change the way this works between C++11
and C++20
standards.
Here are two functions that use the Standard Library to convert between a Unicode Codepoint (char32_t
) and a UTF-8
string (one for each version of The C++ Standard).
inline
std::string cpp11_codepoint_to_utf8(char32_t cp) // C++11 Sandard
{
char utf8[4];
char* end_of_utf8;
char32_t const* from = &cp;
std::mbstate_t mbs;
std::codecvt_utf8<char32_t> ccv;
if(ccv.out(mbs, from, from + 1, from, utf8, utf8 + 4, end_of_utf8))
throw std::runtime_error("bad conversion");
return {utf8, end_of_utf8};
}
inline
std::string cpp20_codepoint_to_utf8(char32_t cp) // C++20 Sandard
{
using codecvt_32_8_type = std::codecvt<char32_t, char8_t, std::mbstate_t>;
struct codecvt_utf8
: public codecvt_32_8_type
{ codecvt_utf8(std::size_t refs = 0): codecvt_32_8_type(refs) {} };
char8_t utf8[4];
char8_t* end_of_utf8;
char32_t const* from = &cp;
std::mbstate_t mbs;
codecvt_utf8 ccv;
if(ccv.out(mbs, from, from + 1, from, utf8, utf8 + 4, end_of_utf8))
throw std::runtime_error("bad conversion");
return {reinterpret_cast<char*>(utf8), reinterpret_cast<char*>(end_of_utf8)};
}
Neither function has been heavily tested.
As far as converting between the string representation of the UTF Codepoint value "2460" and the integer number to store in char32_t
, there are many ways to do this, just remember the number is in hexadecimal (base 16).
You can use something like this for example:
std::string tmp = "2460";
char32_t u32 = std::stoul(tmp, 0, 16);
tmp = cpp11_codepoint_to_utf8(u32);
std::cout << tmp << '\n';