3

In a C++ project, I want to open a file (fstream::open()) (which seems to be a major problem). The Windows build of my program fails miserably.

  • File "ä" (UTF-8 0xC3 0xA4)

    std::string s = ...;
    //Convert s
    std::fstream f;
    f.open(s.c_str(), std::ios::binary | std::ios::in); //Works (f.is_open() == true)
    f.close();
    f.open(s.c_str(), std::ios::binary | std::ios::in | std::ios::out); //Doesn't work
    

    The string s is UTF-8 encoded, but then converted from UTF-8 to Latin1 (0xE4). I'm using Qt, so QString::fromUtf8(s.c_str()).toLocal8Bit().constData().

    Why can I open the file for reading, but not for writing?

  • File "и" (UTF-8 0xD0 0xB8)

    Same code, doesn't work at all.

It seems, this character doesn't fit in the Windows-1252 charset. How can I open such an fstream (I'm not using MSVC, so no fstream::open(const wchar_t*, ios_base::openmode))?

Community
  • 1
  • 1
basic6
  • 3,643
  • 3
  • 42
  • 47
  • 1
    I think filenames in Windows need to be UTF-16 encoded, and you need to use the special Windows file handling functions (`_wfopen`, etc.) to access files via their long name. Alternatively, you could use the short name. – Kerrek SB May 12 '12 at 22:40
  • 2
    What compiler and C library are you using? If you're using, say, MinGW, you can still use functions from the MS CRT such as [`_wfopen`](http://msdn.microsoft.com/en-us/library/yeby3zcb%28v=vs.80%29.aspx). If you're using a different C runtime (such as Cygwin GCC's libc), then you're at the mercy of that runtime library's Unicode support. – Adam Rosenfield May 14 '12 at 04:35
  • Your C and C++ standard library need to support Unicode (i.e. they have to convert their UTF-8 input strings to UTF-16 and then call `CreateFileW`). If they don't, you're out of luck – you then probably need to call `CreateFileW` directly. – Philipp May 17 '12 at 14:05
  • @Adam Rosenfield I'm using mingw32-g++-4.6.2. _wfopen() returns a FILE* pointer, how do I open an fstream object that way? – basic6 May 17 '12 at 21:53
  • @basic6: I unfortunately don't know if there's a way to do that. There is the [`std::wfstream`](http://msdn.microsoft.com/en-us/library/zyz8f7af%28v=vs.80%29.aspx) class, but its `open` method also only takes a `const char*` for the filename. If you want to be able to open Unicode filenames, you'll need to either use C's stdio library, or fully-buffer the file data by reading it all into memory and using a `std::stringstream` to parse the data. – Adam Rosenfield May 18 '12 at 03:55
  • Long answer to short comment: see http://utf8everywhere.org/ about how to do it right. – Pavel Radzivilovsky Nov 28 '14 at 23:08

2 Answers2

4

Using the standard APIs (such as std::fstream) on Windows you can only open a file if the filename can be encoded using the currently set "ANSI Codepage" (CP_ACP).

This means that there can be files which simply cannot be opened using these APIs on Windows. Unless Microsoft implements support for setting CP_ACP to CP_UTF8 then this cannot be done using Microsoft's CRT or C++ standard library implementation.

(Windows has had a feature called "short" filenames where, when enabled, every file on the drive had an ASCII filename that can be used via standard APIs. However this feature is going away so it does not represent a viable solution.)

Update: Windows 10 has added support for setting the codepage to UTF-8

bames53
  • 86,085
  • 15
  • 179
  • 244
4

In Microsoft implementations of STL, there's a non-standard extension (overload) to allow unicode support for UTF-16 encoded strings.

Just pass UTF-16 encoded std::wstring to fstream::open(). This is the only way to make it work with fstream.

You can read more on what I find to be the easiest way to support unicode on windows here: http://utf8everywhere.org/

Pavel Radzivilovsky
  • 18,794
  • 5
  • 57
  • 67
  • That's probably the appropriate solution, but AFAIK this overload is only available with MSVC, not with MinGW ("no matching function for call to..."). And I don't use Microsoft's compiler, because I haven't ported the code I'm currently working on to Microsoft's C++ (in other words, the code won't compile and I haven't bothered yet to find out why). – basic6 May 17 '12 at 21:59
  • 1
    @basic6: please read the [conversion funciotns](http://utf8everywhere.org/#how.cvt) section. AFAIK the nowide library should work on MinGW too. – Yakov Galka May 18 '12 at 07:19