On Windows, you MUST use 8bit ANSI (and it must match the user's locale) or UTF-16 for filenames, there is no other option available. You can keep using string
and UTF-8 in your main code, but you will have to convert UTF-8 filenames to UTF-16 when you are opening files. Less efficient, but that is what you need to do.
Fortunately, VC++'s implementation of std::ifstream
and std::ofstream
have non-standard overloads of their constructors and open()
methods to accept wchar_t*
strings for UTF-16 filenames.
explicit basic_ifstream(
const wchar_t *_Filename,
ios_base::openmode _Mode = ios_base::in,
int _Prot = (int)ios_base::_Openprot
);
void open(
const wchar_t *_Filename,
ios_base::openmode _Mode = ios_base::in,
int _Prot = (int)ios_base::_Openprot
);
void open(
const wchar_t *_Filename,
ios_base::openmode _Mode
);
explicit basic_ofstream(
const wchar_t *_Filename,
ios_base::openmode _Mode = ios_base::out,
int _Prot = (int)ios_base::_Openprot
);
void open(
const wchar_t *_Filename,
ios_base::openmode _Mode = ios_base::out,
int _Prot = (int)ios_base::_Openprot
);
void open(
const wchar_t *_Filename,
ios_base::openmode _Mode
);
You will have to use an #ifdef
to detect Windows compilation (unfortunately, different C++ compilers identify that differently) and temporarily convert your UTF-8 string to UTF-16 when opening a file.
#ifdef _MSC_VER
std::wstring ToUtf16(std::string str)
{
std::wstring ret;
int len = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), NULL, 0);
if (len > 0)
{
ret.resize(len);
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), str.length(), &ret[0], len);
}
return ret;
}
#endif
int main()
{
std::string utf8path = ...;
std::ifstream iFileStream(
#ifdef _MSC_VER
ToUtf16(utf8path).c_str()
#else
utf8path.c_str()
#endif
, std::ifstream::in | std::ifstream::binary);
...
return 0;
}
Note that this is only guaranteed to work in VC++. Other C++ compilers for Windows are not guaranteed to provide similar extensions.
UPDATE: as of Windows 10 Insider Preview Build 17035, Microsoft now supports UTF-8 as a system-wide encoding that users can set their locale to. And as of Windows 10 Version 1903 (build 18362), applications can now opt in via their app manifest to use UTF-8 as a process-wide codepage, even if the user locale is not set to UTF-8. These features allow ANSI-based APIs (like CreateFileA()
, which std::ifstream
/std::ofstream
use internally) to work with UTF-8 strings. So, in theory, with this feature turned on, you might be able to pass a UTF-8 encoded string to std::ifstream
/std::ofstream
and it would "just work". I can't confirm that, as it very much depends on the implementation. It would be safer to stick with passing in UTF-16 filenames, since that is Windows' native encoding, which the ANSI APIs will simply convert to internally.