3

Simple task: I want to read a file which has a non-ascii file name.

On linux and MacOS, I simply pass the file name as a UTF-8 encoded string to the fstream constructor. On windows this fails.

As I learned from this question, windows simply does not support utf-8 filenames. However, it provides an own non-standard open method that takes a utf-16 wchar_t*. Thus, I could simply convert my string to utf-16 wstring and be fine. However, in the MinGW standard library, that wchar_t* open method of fstream simply does not exist.

So, how can I open a non-ascii file name on MinGW?

Community
  • 1
  • 1
gexicide
  • 38,535
  • 21
  • 92
  • 152
  • It appears this may not be possible: [http://stackoverflow.com/questions/10567893/fstreamopen-unicode-or-non-ascii-characters-dont-work-with-stdiosout?rq=1](http://stackoverflow.com/questions/10567893/fstreamopen-unicode-or-non-ascii-characters-dont-work-with-stdiosout?rq=1) – owacoder Sep 20 '16 at 12:50
  • @owacoder: I cannot read a non-ascii file on MinGW? That would be hilarious. That would be a super harsh restriction which basically makes MinGW useless for countries in which non-ascii characters in names are common (basically more than half of all countries on earth). Thus, there simply has to be a way. – gexicide Sep 20 '16 at 12:51
  • I meant using fstream directly. There are surely many workarounds using other methods (direct system-specific calls, for sure). – owacoder Sep 20 '16 at 12:57

2 Answers2

1

I struggled with the same issue before. Unfortunately, until you can use std::filesystem::path, you need to work around this in some way, e.g. by wrapping everything, e.g. like I did here, which makes "user code" look like this:

auto stream_ptr = open_ifstream(file_name); // I used UTF-8 and converted to UTF-16 on Windows as in the code linked above
auto& stream = *stream_ptr;
if(!stream)
    throw error("Failed to open file: \'" + filename + "\'.");

Ugly yes, slightly portable, yes. Note this does not work on Libc++ on Windows, although that combination is currently not functioning anyways that doesn't matter much.

rubenvb
  • 74,642
  • 33
  • 187
  • 332
  • how would a solution with `std::filesystem::path` look like? Isn't that class already available as experimental extension? – gexicide Sep 20 '16 at 13:00
  • It is, in the newest builds. I believe MSVC also has support of some form of filesystem, no idea if they're compatible. I can't find anything except the [technical specification](https://isocpp.org/files/papers/p0218r0.html) and it seems to imply Windows will stay quite broken if I'm reading it correctly :/. – rubenvb Sep 20 '16 at 13:11
1

You probably can give Boost.Nowide a try. It has a fstream wrapper which will convert your string to UTF-16 automatically. It is not yet in boost, but already in the review schedule (and hopefully soon part of boost). I never tried it with mingw but played around with visual studio and found it quit neat.

user1810087
  • 5,146
  • 1
  • 41
  • 76
  • 1
    If I introduced such a not-stable-yet dependency to our project, my manager would kill me ;). – gexicide Sep 21 '16 at 08:54
  • @gexicide I see.... than probably a more professional library [icu](http://site.icu-project.org/). But i never tried it, so i cannot say if it meets you requirements. – user1810087 Sep 21 '16 at 08:56
  • Has ICU a way to open an fstream from an UTF-16 object? We already use ICU, but just converting to UTF-16 is not enough, as I miss the fstream constructor that takes it. – gexicide Sep 21 '16 at 09:23
  • @gexicide hmmm... that's odd. I thought converting the string to UTF-16, putting it into a wide-char string-variant and using the wide char variant [std::wifstream](http://en.cppreference.com/w/cpp/io/basic_ifstream) could work. But, as i said, i never tried it, since i try to avoid any widechar/ UTF-16 stuff, and try to use only UTF-8. – user1810087 Sep 21 '16 at 09:49
  • Nope, it doesn't :). I don't want a `wifstream`. The `wifstream` assumes the *file contents* are multibyte chars. But that is not the case, only the *file name* is unicode. The constructor of `wifstream` therefore also takes a `char`, not a `wchar_t`. There is no `wchar_t` constructor - at least not on MinGW or linux. – gexicide Sep 21 '16 at 15:17