0

I'm using QFileDialog::getOpenFileName() to have the user select a file, but I need the result to be a C string, since I have to pass it to something written in C which uses fopen(). I cannot change this.

The problem I'm finding is that, on Windows/MinGW, using toStdString() on the resulting QString doesn't work well with Unicode/non-ASCII filenames. Trying to open the file based on the std::string fails, because some character set conversion seems to be occurring. Sometimes using toLocal8Bit() to convert works, but sometimes it doesn't.

Consider the following (MinGW) program:

#include <cstdio>
#include <iostream>

#include <QApplication>
#include <QFileDialog>
#include <QFile>

int main(int argc, char **argv)
{
    QApplication app(argc, argv);
    auto filename = QFileDialog::getOpenFileName();
    QFile f(filename);

    std::cout << "fopen: " << (std::fopen(filename.toStdString().c_str(), "r") != nullptr) << std::endl;
    std::cout << "fopen (local8bit): " << (std::fopen(filename.toLocal8Bit().data(), "r") != nullptr) << std::endl;
    std::cout << "Qt can open: " << f.open(QIODevice::ReadOnly) << std::endl;
}
  • For a file called ☢.txt, toStdString() works, local8Bit() doesn't.
  • For a file called ä.txt, toStdString() doesn't work, local8Bit() does.
  • For a file called Ȁ.txt, neither works.

In all cases, though, QFile is able to open the file. I suppose it's probably using Unicode Windows functions while the C code is using fopen(), which, to my understanding is a so-called ANSI function on Windows. But is there any way to get a “bag of bytes”, so to speak, from a QString? I don't care about the encoding of the filename, I just want something that can be passed to fopen() to open the file.

I've found that using GetShortPathName to get a short filename from filename.toWCharArray() seems to work, but that's very cumbersome, and my understanding is that NTFS filesystems can be told not to support short names, so it's not a viable solution in general anyway.

Chris
  • 947
  • 6
  • 10
  • Forget about `local8Bit()`, it's a thing from the past. That said, what do you mean with "doesn't work"? – Ulrich Eckhardt Feb 17 '22 at 17:00
  • 1
    [toStdWString](https://doc.qt.io/qt-5/qstring.html#toStdWString) passed to [_wfopen](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfopen?view=msvc-170) is probably the most reliable approach on windows – Alan Birtles Feb 17 '22 at 17:08
  • If you just want direct access to the data, try calling [constData](https://doc.qt.io/qt-5/qstring.html#constData). – JarMan Feb 17 '22 at 17:11
  • "doesn't work" is described in the question as showing that fopen() fails to open the file, even while QFile succeeds (so the file exists). – Chris Feb 17 '22 at 17:20
  • Unfortunately for _wfopen(), I don't have control over the call to open the file. This is getting passed elsewhere to a module expecting a char* to pass to fopen(). – Chris Feb 17 '22 at 17:21
  • 1
    then you're basically stuffed, `fopen` on windows can only handle files that can be encoded in your local code page, if you want to open arbitrary files you need to use the unicode APIs – Alan Birtles Feb 17 '22 at 17:23
  • I looked at constData(), but that returns a pointer to QChar, which I can't see a way to convert, without charset issues, to a C string. You can't just cast the QChar* because it will contain zeros for ASCII/ASCII-range values, interpreted as null bytes ending the string. – Chris Feb 17 '22 at 17:24
  • Maybe set [UTF-8 locale](https://stackoverflow.com/a/63454192/1983398) for your program? – ssbssa Feb 18 '22 at 09:11
  • @ssbssa That may well be the best approach, I'll have to investigate it. Thanks! – Chris Feb 19 '22 at 00:17

1 Answers1

1

File paths in the non-unicode API of Windows are either parsed in the current ANSI (Microsoft codec) codepage, or in the OEM codepage (see also https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fopen-wfope). ANSI is the default.

So your question translates to: How can I convert a UTF-8 or UTF-16 string to ANSI or OEM?

There's an answer for the ANSI conversion: How to convert from UTF-8 to ANSI using standard c++

Anyhow, it's important to realize that not all UTF strings can be represented in these more narrow codecs...

kkoehne
  • 1,176
  • 9
  • 17
  • This is effectively the path I'm going, I think, using Qt's local8Bit() method, which converts a Unicode QString to a C string in the current locale. As you note, it won't work for all characters, but it's much better than converting to UTF-8 in places UTF-8 isn't supported at all! – Chris Feb 19 '22 at 00:18
  • You're right that QString::toLocal8Bit() will by default convert to ANSI, using WideCharToMultiByte(CP_ACP...) internally. So yes, that should work for strings with characters that can actually be represented in the Windows encoding... – kkoehne Feb 21 '22 at 14:46