3

In my program the user can either provide a filename on the command line or using a QFileDialog. In the first case, I have a char* without any encoding information, in the second I have a QString.

To store the filename for later use (Recent Files), I need it as a QString. But to open the file with std::ifstream, I need a std::string.

Now the fun starts. I can do:

filename = QString::fromLocal8Bit(argv[1]);

later on, I can do:

std::string fn = filename.toLocal8Bit().constData();

This works for most characters, but not all. For example, the word Раи́са will look the same after going through this conversion, but, in fact, have different characters. So while I can have a Раи́са.txt, and it will display Раи́са.txt, it will not find the file in the filesystem. Most letters work, but и́ doesnt. (Note that it does work correctly when the file was chosen in the QFileDialog. It does not when it originated from the command line.)

Is there any better way to preserve the filename? Right now I obtain it in whatever native encoding, and can pass-on in the same encoding, without knowing it. At least so I thought.

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
ypnos
  • 50,202
  • 14
  • 95
  • 141
  • Why use ifstream to open the file, instead of QFile? – TheDarkKnight Nov 04 '14 at 17:03
  • 1
    The file is opened in a different module that has no Qt dependencies. It is decoupled from the GUI. – ypnos Nov 04 '14 at 17:05
  • 1
    Why are you using QString::fromLocal8Bit ? From documentation of this function it says following: "QTextCodec::codecForLocale() is used to perform the conversion from Unicode. If the locale encoding could not be determined, this function does the same as toLatin1()." It looks like in your case toLatin1() finally is called and when you later call toLocal8Bit() happens following : "If this string contains any characters that cannot be encoded in the locale, the returned byte array is undefined. Those characters may be suppressed or replaced by another." – Max Go Nov 04 '14 at 18:24
  • if you are on Qt5, have you tried just QString filename(argv[1]); and later just std::string fn = QByteArray(filename).constData(); – Max Go Nov 04 '14 at 18:28
  • @N1ghtLight I thought the same thing, but he _is_ seeing a correct string in `QString filename` so his `codecForLocale` must be set correctly, cause if `toLatin1` gets called special characters are: "Suppressed or replaced with a question mark." http://qt-project.org/doc/qt-5/qstring.html#toLatin1 – Jonathan Mee Nov 04 '14 at 18:37
  • Yes, with previous conversions I saw ????????? in the output :). So it is doing *something* right. – ypnos Nov 05 '14 at 07:52
  • @ypnos, please let me know if my comments helped, so I can add this as answer :) – Max Go Nov 05 '14 at 08:33

1 Answers1

1

'и́' is not an ASCII character, that is to say it has no 8-bit representation. How it is represented in argv[1] then is OS dependent. But it's not getting represented in just one char.

The fromLocal8bit uses the same QTextCodec::codecForLocale as toLocal8bit. And as you say your std::string will hold "Раи́са.txt" so that's not the problem.

Depending on how your OS defined std::ifstream though std::ifstream may expect each char to be it's own char and not go through the OS's translation. I expect that you are on Windows since you are seeing this problm. In which case you should use the std::wstring implementation of std::fstream which is Microsoft specific: http://msdn.microsoft.com/en-us/library/4dx08bh4.aspx

You can get a std::wstring from QString by using: toStdWString

See here for more info: fstream::open() Unicode or Non-Ascii characters don't work (with std::ios::out) on Windows

EDIT:

A good cross-platform option for projects with access to it is Boost::Filesystem. ypnos Mentions File-Streams as specifically pertinent.

Community
  • 1
  • 1
Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • Actually it is cross-platform and I tested on Linux. That's why I abstain from wstring right now.. – ypnos Nov 05 '14 at 07:54
  • 1
    @ypnos I'm not a Linux pro, but I understand that it uses UTF-8 for filesystem. So you may want to go with an `#ifdef` using a `std::wstring` on Windows and a `QString::toUtf8` on Linux. A Cleaner alternative would be to use [`Boost:Filesystem`](http://www.boost.org/doc/libs/1_47_0/libs/filesystem/v3/doc/v3.html) But I understand shying away from that. – Jonathan Mee Nov 05 '14 at 12:02
  • It is a good suggestion. On Linux you mostly get UTF8 nowadays, but only mostly. And then there is also OS X. Boost:Filesystem is a good idea. From your suggestion I found http://www.boost.org/doc/libs/1_47_0/libs/filesystem/v3/doc/reference.html#File-streams – ypnos Nov 05 '14 at 16:34
  • 1
    @ypnos Yeah the great thing about that is I believe that all that `path` based stuff is in tr2 so it should go into the next big C++ update. So rather than learning some stuff you'll never use again you're learning the future of C++! – Jonathan Mee Nov 05 '14 at 16:44
  • Haha thats a good Boost advert. I even have a Boost:Filesystem dependency already in my project so I am seriously considering it. – ypnos Nov 05 '14 at 18:31
  • @ypnos Oh yeah in that case you should absolutely do that, Boost is very cross platform friendly. I'll add the comment to the answer for posterity. – Jonathan Mee Nov 05 '14 at 19:57