In my application I have to constantly convert string between std::string
and std::wstring
due different APIs (boost, win32, ffmpeg etc..). Especially with ffmpeg the strings end up utf8->utf16->utf8->utf16, just to open a file.
Since UTF8 is backwards compatible with ASCII I thought that I consistently store all my strings UTF-8 std::string
and only convert to std::wstring
when I have to call certain unusual functions.
This worked kind of well, I implemented to_lower, to_upper, iequals for utf8. However then I met several dead-ends std::regex, and regular string comparisons. To make this usable I would need to implement a custom ustring
class based on std::string with re-implementation of all corresponding algorithms (including regex).
Basically my conclusion is that utf8 is not very good for general usage. And the current std::string/std::wstring
is mess.
However, my question is why the default std::string
and ""
are not simply changed to use UTF8? Especially as UTF8 is backward compatible? Is there possibly some compiler flag which can do this? Of course the stl implemention would need to be automatically adapted.
I've looked at ICU, but it is not very compatible with apis assuming basic_string, e.g. no begin/end/c_str etc...