In addition to the standard mandated encodings C++ also supports an implementation defined list of encodings via locales:
#include <locale>
#include <codecvt>
#include <iostream>
template <typename Facet>
struct usable_facet : Facet {
using Facet::Facet;
};
using codecvt = usable_facet<std::codecvt_byname<wchar_t, char, std::mbstate_t>>;
int main() {
std::wstring_convert<codecvt> convert(new codecvt(".1252")); // platform specific locale strings
std::wstring w = convert.from_bytes("\u00C0");
}
Unfortunately one of the things about wchar_t
is that the standard mandates only that it use a fixed width encoding for all locales, but there's no requirement that it use the same encoding in different locales, and so you can't portably convert to wchar_t
using one locale and then convert that back to char
using a different locale.
There is potentially some portable support for such conversions using functions like std::mbrtoc32
and related functions, but these are not yet widely implemented.
I understand that this can be done with a library such as iconv, but I am curious whether it can be done using only the C++ standard library. I ask this question not because I don't want to use iconv, but because I don't really understand how locales work in C++.
The locale library's design doesn't really lend itself to modern usage. C and C++ are themselves confused about encodings vs. character sets, and locales conflate lexical and orthographic issues with computational aspects such as encoding.
How locales work is a topic a bit broader than is suitable for a stackoverflow answer but there are books on the topic. You'd probably also need to read platform specific materials, because the standard doesn't really give any context for much of the functionality. For example the locale library supports message catalogues, but doesn't tell you what they are or how you'd actually make one because that's functionality is not standardized by C++.