Is wstring able to do utf-8?
C++ has standard functions (wstring_convert
) that are able to convert between wstring and UTF-8 strings. There are also standard functions in both C and C++ (wcstombs
, mbstowcs
), that may be able to do the same with C-wstrings if your system has an appropriate locale. Most POSIX-is systems do, Windows-based ones normally don't (they have non-standard facilities for that). That's about all wstring and UTF-8 have to do with each other.
Is utf-8 even the thing I should be concerned with?
It depends. If you are living in 1980, or don't do any programming, then probably not. If you don't do any character-level processing, and only shuffle entire strings, you should also be fine. Just use char
-based strings and don't worry about any fancy characters. It all should work more or less automaticaly.
If you do need character-level stuff (substrings, search, ...) you probably do need to be aware of UTF-8. It's probably wise to do all internal processing with either wchar_t or char32_t, and convert from or to UTF-8 upon I/O. (I would just say "use wchar_t" but alas, on Windows wchar_t is broken. You may still be able to get away with it, but no promises.)
If I have a u8"string" or L"string" which contains characters from multiple languages, how would I write this to file using only the C standard IO library?
You cannot do much about u8"string"
in C. In C++, they are normal char
-based strings and can be written as any other string, and do the right thing. (You may have to jump through some hoops on Windows, see _setmode and _O_U8TEXT docs). Thia is however of a minor importance. You nirmally don't need to have any fancy characters in string literals. All user-facing strings should be loaded from files.
With wchar_t based strings, you may or may not be able to output UTF-8 directly, depending on your OS and compiler. You can always convert to UTF-8 and output that.
If you are willing to use third-party libraries, consider using http://utfcpp.sourceforge.net/
Also read:
http://utf8everywhere.org
http://www.joelonsoftware.com/articles/Unicode.html