I'm in the process of fixing a large open source cross-platform application such that it can handle file paths containing non-ANSI characters on Windows.
Update:
Based on answers and comments I got so far (thanks!) I feel like I should clarify some points:
I cannot modify the code of dozens of third party libraries to use
std::wchar_t
. This is just not an option. The solution has to work with plain ol'std::fopen()
,std::ifstream
, etc.The solution I outline below works at 99%, at least on the system I'm developing on (Windows 10 version 1909, build 18363.535). I haven't tested on any other system yet.
The only remaining issue, at least on my system, is basically number formatting and I'm hopeful that replacing the
std::numpunct
facet does the trick (but I haven't succeeded yet).
My current solution involves:
Setting the C locale to
.UTF-8
for theLC_CTYPE
category on Windows (all other categories are set to theC
locale as required by the application):// Required by the application. std::setlocale(LC_ALL, "C"); // On Windows, we want std::fopen() and other functions dealing with strings // and file paths to accept narrow-character strings encoded in UTF-8. #ifdef _WIN32 { #ifndef NDEBUG char* new_ctype_locale = #endif std::setlocale(LC_CTYPE, ".UTF-8"); assert(new_ctype_locale != nullptr); } #endif
Configuring
boost::filesystem::path
to use theen_US.UTF-8
locale so that it too can deal with paths containing non-ANSI characters:boost::filesystem::path::imbue(std::locale("en_US.UTF-8"));
The last missing bit is to fix file I/O using C++ streams such as
std::ifstream istream(filename);
The simplest solution is probably to set the global C++ locale at the beginning of the application:
std::locale::global(std::locale("en_US.UTF-8"));
However that messes up formatting of numbers, e.g. 1234.56 gets formatted as 1,234.56.
Is there a locale that just specifies the encoding to be UTF-8 without messing with number formatting (or other things)?
Basically I'm looking for the C.UTF-8
locale, but that doesn't seem to exist on Windows.
Update: I suppose one solution would be to reset some (most? all?) of the facets of the locale, but I'm having a hard time finding information on how to do that.