I'm contributing to a C library. It has a function that takes a char*
parameter for a file path name. The authors are mostly UNIX developers, and this works fine on unixes where char*
mostly means UTF-8. (At least in GCC, the character set is configurable and UTF-8 is the default.)
However, char*
means ANSI on Windows, which implies that it is currently impossible to use Unicode path names with this library on Windows, where wchar_t*
should be used and only UTF-16 is supported. (A quick search on StackOverflow reveals that the ANSI Windows API functions can not be used with UTF-8.)
The question is, what is the right way to deal with this? We've come up with various ways to do it, but neither of us are Windows experts, so we can't really decide how to do it properly. Our goal is that the users of the library should be able to write cross-platform code that would work on unixes as well as windows.
Under the hood, the library has #ifdef
s in place to differentiate between operating systems so that it can use POSIX functions on UNIXes and Win32 APIs on Windows.
So far, we've come up with the following possibilities:
- Offer a separate windows-only function that accepts a
wchar_t*
. - Require UTF-16 on Windows and
#ifdef
the library header in such a way that the function would acceptwchar_t*
on Windows. - Add a flag that would tell the function to cast the given
char*
towchar_t*
and call the widechar Windows APIs. - Create a variant of the function that takes a file descriptor (or file handle on Windows) instead of a file path.
- Always require UTF-8 (even on Windows), and then inside the function, convert UTF-8 to UTF-16 and call the widechar Windows APIs.
The problem with options 1-4 is that they would require the user to consciously take care of portability themselves. Option 5 sounds good, but I'm not sure if this is the right way to go.
I'm also open to other suggestions or ideas that can solve this. :)