1

I have a multibyte Windows project where I try to access a file which can have name with any symbols modern Windows allows. But I fail miserably in case of file name which contains non ASCII characters (Japanese, Swedish, Russian, etc).

For example:

const char * filename_ = "C:\\testÖ.txt"
struct _finddata_t fd;
long fh = _findfirst(filename_, &fd);

At this point _findfirst() fails.

What would be best solution here to support all possible file names? I read that _findfirst() depends on system locale that was set when program was started. Well, I can change that for a certain one but how can I determine the needed locale for a filename in this case?

The project has to remain multibyte.

Did anyone solve such problem before?

Also I tried to use wide char conversion but no luck as well. Code example below:

debug_prnt("DEBUG: Checking existance of a file: %s\n", filename_);
struct _wfinddata_t ff;
size_t requiredSize = mbstowcs(NULL, filename_, 0);
wchar_t * filename = (wchar_t *)malloc((requiredSize + 1) * sizeof(wchar_t));
if (!filename)
{
    debug_prnt("ERROR: Memory allocation failed\n");
    return FALSE;
}
size_t size = mbstowcs(filename, filename_, requiredSize + 1);
if (size == (size_t)(-1))
{
    debug_prnt("ERROR: Couldn't convert string--invalid multibyte character.\n");
    return FALSE;
}

long fh = _wfindfirst(filename, &ff);
if (fh > 0)
    debug_prnt("DEBUG: File exists\n");
else
    debug_prnt("DEBUG: File does not exist %ls\n", filename);
free(filename);
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
Artur Korobeynyk
  • 955
  • 1
  • 9
  • 24
  • docs on `_findfirst()` and variants is https://msdn.microsoft.com/en-us/library/zyzxfzac.aspx and it looks to me like you should be using `_wfindfirst()`. In general with Windows programs these days I stick with UNICODE and wide characters since the Windows API expects it. Why are you using `strlen()`? This implies your original `filename_` contains `char` text and not `wchar_t` text so that may be where your problem is. – Richard Chambers Jun 10 '16 at 10:57
  • Did a mistake with strlen. I already found a correct length calculation at IBM forums and updated code here but I still fail to find the file. I am also using _wfindfirst but no luck so far. – Artur Korobeynyk Jun 10 '16 at 11:07
  • is this the actual code you are using? The example here, http://www.cplusplus.com/reference/cstdlib/mblen/, for `mblen()` with `mbtowc()` shows a reset on both functions and doing it differently than you are doing. – Richard Chambers Jun 10 '16 at 11:18
  • 1
    And I think you want to use `mbstowcs()` instead. http://www.cplusplus.com/reference/cstdlib/mbstowcs/ – Richard Chambers Jun 10 '16 at 11:21
  • I was using IBM example https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rtref/mbtowc.htm – Artur Korobeynyk Jun 10 '16 at 11:23
  • `mbtowc()` converts a character and I think you want to convert an entire string so `mbstowcs()` would be more appropriate. Isn't an entire string your goal? – Richard Chambers Jun 10 '16 at 11:25
  • Truly it is. The idea is to convert the whole string. I have switched to `_mbstowcs` but that does not help also. Updated the code. – Artur Korobeynyk Jun 10 '16 at 11:34
  • Looking at the updated question, there are a couple of things you need to do. First of all you need to clarify what you mean by it does not work. What errors and behavior are you seeing? What is the return value of `mbstowcs()`? Are you able to use a debugger to see what happens as you step through each line of code. The second thing is that you should follow the example as provided by this Microsoft doc https://msdn.microsoft.com/en-us/library/k1f9b8cy.aspx which shows converting back and forth between wide and multibyte which is using the function `mbstowcs()` differently than you are. – Richard Chambers Jun 10 '16 at 11:47
  • Did it exactly like Microsoft requests. For now it still does not find the file. The part I am fighting with is at `long fh = _wfindfirst(filename, &ff);` So at this point fh == -1, which goes for file not found while it exists. – Artur Korobeynyk Jun 10 '16 at 12:24
  • By the way, the filename seems to have correct name, at least judging from symbols look. – Artur Korobeynyk Jun 10 '16 at 12:31

1 Answers1

1

Here is a short but complete Windows console application that uses the functions you are wanting to use.

What this program does is to create a file in the current working folder as something to find and then does a listing of the files that have an extension of .txt in the current working folder.

For the search criteria, I am using a hard coded wide character string. In your case you may need to accept the string as a multibyte string, convert it to wide character, and then use it with the _wfindfirst().

However with my setup, there appears to be a text conversion problem with the printf() so there is a strange character in the non-ASCII text printed to the console. However the debugger shows it fine.

// multibyte_file_search.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <locale.h>
#include <io.h>

int _tmain(int argc, _TCHAR* argv[])
{
    const char * filename_ = "testÖ.txt";
    FILE *fp = fopen (filename_, "w");
    fclose(fp);

    // test out mbstowcs()
    wchar_t *wcsFileName_ = new wchar_t[512];
    int requiredSize = mbstowcs(NULL,filename_,0);
    size_t xsize = mbstowcs(wcsFileName_,filename_,512);
    printf ("mbstowcs() return %d\n", xsize);

    // do an actual directory search on the current working directory.
    printf ("\n\n Directory search begins.\n");
    struct _wfinddata_t ff = {0};
    char *csFileName_ = new char[512];
    strcpy (csFileName_, "*.txt");
    xsize = mbstowcs(wcsFileName_,csFileName_,512);  // convert search to wide character.
    intptr_t  fh = _wfindfirst(wcsFileName_, &ff);

    if (fh != -1) {
        do {
            wcstombs (csFileName_, ff.name, 512);
            printf (" ff.name %S and converted name %s \n", ff.name, csFileName_);
            wprintf (L"     ff.name %s and converted name %S \n", ff.name, csFileName_);
        } while (_wfindnext (fh, &ff) == 0);
        _findclose (fh);
    } else {
        printf ("No files in directory.\n");
    }

    return 0;
}
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
  • But you did not use `wcsFileName_` in `_wfindfirst` after the `mbstowcs`. Try to provide an actual filename instead of masked filename. Well, it fails for me with `DEBUG: Checking existance of a file: C:\testÖ.txt Handle: -1 DEBUG: File does not exist C:\testÖ.txt, errno: 2` Here I printed file `handle` and `errno` right after calling `_wfindfirst` – Artur Korobeynyk Jun 10 '16 at 13:08
  • There you go. Good luck. – Richard Chambers Jun 10 '16 at 13:15
  • Dude, if you already know the file name then you don't need _findfirst(). Just use the file name you already know. _findfirst() is to search a directory using a criteria in order to develop a list of files in the directory that match the criteria. – Richard Chambers Jun 10 '16 at 13:17
  • Afaik, I know file name but I don't know if it exist at the moment of calling this function. I found the solution. wcstombs is an STL implemetation of conversion and it fails in case of UTF-8 string was sent inside char pointer as a source. Microsoft function works better so you need to use https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx It works – Artur Korobeynyk Jun 10 '16 at 14:13
  • Why would you be asking about multi-byte text when it is UTF-8 text? See [Difference between MBCS and UTF-8 on Windows](http://stackoverflow.com/questions/3298569/difference-between-mbcs-and-utf-8-on-windows) which explains the difference and mentions that you must use `MultiByteToWideChar()'. – Richard Chambers Jun 10 '16 at 15:04
  • Imagine that you send a filename within args from console windows. Now you have a file that contains both - Cyrillic and Swedish symbols. Console will fail to determine any codepage here because none will work and STL conversion will fail trying to use its multybite conversion, but file still needs to be accessed somehow, that is why I used Microsoft's conversion function, which will handle the UTF-8 and after that file became accessible. – Artur Korobeynyk Jun 10 '16 at 17:34