2

PART ONE

I clearly have a lack of knowledge regarding character encoding and the more I look into it, the more I find how deep the rabbit hole goes and I’m hoping to find the secret of the ancients in here.

Long story short, I’m working on a C++ app that has several functions passing filenames that uses const char* as arguments. Basically, operations like opening a file needs to carry const char* in multiples loops. That said, changing argument from const char* to let say const wchar* is logistically speaking, quite challenging and it end up in a deadlock several times.

Most documentation I found about character encoding conversion always assume what the input would be known. But in this case, the app takes YouTube video titles as filenames and it never knows what it is going to get… or does it?

Simply put, I got something like this:

wchar_t *ZStr::UTF8ToWChar(const char *in_char)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> l_converter;
    std::wstring l_path = l_converter.from_bytes(in_char);
    Int64 l_size = l_path.length() + 1;
    wchar_t *l_buffer = new wchar_t[l_size];
    wcsncpy(l_buffer, l_path.c_str(), l_size);
    return l_buffer;
}

bool ZStream::Open(const char *in_path, const char *in_mode)
{
    return m_File = _wfopen(ZStr::UTF8ToWChar(in_path), ZStr::UTF8ToWChar(in_mode));
}

It seems to work so far with filenames such as "360° Eiffel Tower.mp4" or "Tropical Break.mp4" but I’m not quite sure I’m doing it right.

My question is, is there a way to cover all possible OS filenames and keep const char* arguments throughout my app?

Note: My development environment is Windows 10 but I’d really like to find a solution that is portable. In the worst case, I’d go for a OS dependent one. (I don’t mind whatever overhead it might require).

PART TWO

Thank you all for your comments, it is very appreciated. To complicate things a little more, I use QT open file dialog that returns a QString which seems to be a (wchar_t*) to open files. QString offers a great deal of conversion right out the box but I’m still banging my head to find the one solution that works for all filenames. So far, Tropical Break.mp4 returns invalid by _waccess:

Bool ZSystem::FileAccess(const Char *in_path, Int in_val)
{
    return (_waccess(ZStr::UTF8ToWChar(in_path).c_str(), in_val) == 0);
}

bool QT_Editor::Open(const QString &in_path)
{
    if (ZSystem::FileAccess(in_path.toUtf8().constData(), 0)) /// constData() = const char*
    {
        ///...
    }
}

I try several combinations without success. I realized that it is not very efficient but I’m stuck with choices I’ve made a long time ago. I’m not even sure what type is but its integrity seems to be lost in the loops. Any insights would be greatly welcome.

neosettler
  • 261
  • 3
  • 14
  • https://stackoverflow.com/questions/8032080/how-to-convert-char-to-wchar-t – Francis Cugler Mar 16 '18 at 05:41
  • 2
    Your usage of `UTF8ToWChar()` is causing memory leaks, as you don't `delete[]` the memory you `new[]`. You really should be returning `std::wstring` instead, and then use its `c_str()` method if you need a `wchar_t*`. In any case, if you ensure that your `const char*` values are always UTF-8 encoded, and you convert them to platform encodings when making platform calls, you will be fine, and portable. – Remy Lebeau Mar 16 '18 at 07:35
  • 1
    When you are in Rome then it helps to act like a Roman. Windows uses utf-16 encoding consistently. That's wchar_t in a program. You can certainly ignore that, write the glue and apply it where necessary. But it is busy-work and you picked C++ because you like it to be fast. Do what works and keeps you happy. – Hans Passant Mar 16 '18 at 08:30
  • What are you trying to do? Are you giving full file path, or just file name? Try use full file path. And are You giving good value in `in_val` (4 is for read permission). – Paweł Iwaneczko Mar 18 '18 at 09:32
  • Hello Pawel, thank you for your input. I’m using full path indeed. I should have mentioned it earlier, my bad. I use _waccess to filter the full path to prevent hard crashes down the pipes. Do you have in mind that slashes and/or backslashes could have an impact on the conversion? I’ve edited the example to fit the argument passed to _waccess. (Zero seems to be for existence only which is the one I’ve been using all along) – neosettler Mar 18 '18 at 18:32
  • Debug print from the open file dialog gives C: /Videos/filename.mp4 To my understanding and for simplicity sake, here’s what I think needs to be done: Convert QString (wchar_t*) from the file open dialog to (const char*) = in_path.toUtf8().constData() Convert (const char*) to (const wchar_t *) for _waccess = ZStr::UTF8ToWChar(in_path).c_str() The file path print correctly but fails under _waccess hood. Could it be a null terminated issue? – neosettler Mar 18 '18 at 18:32

1 Answers1

1

Use just wstring as result. Try to edit your function like bellow:

std::wstring ZStr::UTF8ToWChar(const char *in_char)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t>> l_converter;
    return l_converter.from_bytes(in_char);
}

bool ZStream::Open(const char *in_path, const char *in_mode)
{
    return m_File = _wfopen(ZStr::UTF8ToWChar(in_path).c_str(), ZStr::UTF8ToWChar(in_mode).c_str());
}
Paweł Iwaneczko
  • 853
  • 1
  • 10
  • 13