Is there a standard way to do an fopen
with a Unicode string file path?

- 37,963
- 15
- 156
- 475

- 339,232
- 124
- 596
- 636
4 Answers
No, there's no standard way. There are some differences between operating systems. Here's how different OSs handle non-ASCII filenames.
Linux
Under Linux, a filename is simply a binary string. The convention on most modern distributions is to use UTF-8 for non-ASCII filenames. But in the beginning, it was common to encode filenames as ISO-8859-1. It's basically up to each application to choose an encoding, so you can even have different encodings used on the same filesystem. The LANG
environment variable can give you a hint what the preferred encoding is. But these days, you can probably assume UTF-8 everywhere.
This is not without problems, though, because a filename containing an invalid UTF-8 sequence is perfectly valid on most Linux filesystems. How would you specify such a filename if you only support UTF-8? Ideally, you should support both UTF-8 and binary filenames.
OS X
The HFS filesystem on OS X uses Unicode (UTF-16) filenames internally. Most C (and POSIX) library functions like fopen
accept UTF-8 strings (since they're 8-bit compatible) and convert them internally.
Windows
The Windows API uses UTF-16 for filenames, but fopen
uses the current codepage, whatever that is (UTF-8 just became an option). Many C library functions have a non-standard equivalent that accepts UTF-16 (wchar_t
on Windows). For example, _wfopen
instead of fopen
.

- 44,692
- 7
- 66
- 118

- 32,319
- 7
- 89
- 113
In *nix, you simply use the standard fopen
(see more information in reply from TokeMacGuy, or in this forum)
In Windows, you can use _wfopen
, and then pass a Unicode string (for more information, see MSDN).
As there is no real common way, I would wrap this call in a macro, together with all other system-dependent functions.
This is a matter of your current locale. On my system, which is Unicode-enabled, file paths will be in Unicode. I'm able to detect this by means of the locale command:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
The encoding of file paths is normally set system wide, so if your file path is not in the system's locale, you will need to convert it, perhaps by means of the iconv library.

- 37,963
- 15
- 156
- 475

- 975
- 1
- 8
- 7
Almost all POSIX platforms use UTF-8 nowadays. And modern Windows also support UTF-8 as the locale, you can just use UTF-8 everywhere and open any files without using wide strings on Windows. fopen
just works portably
setlocale(LC_ALL, "en_us.utf8"); // need some setup before calling this
fopen(R"(C:\filê\wíth\Ünicode\name.txt)", "w+");
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that
char
strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use".UTF8"
as the code page when usingsetlocale
. For example,setlocale(LC_ALL, ".UTF8")
will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page....
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.

- 37,963
- 15
- 156
- 475