5

I'm trying to make a simple program to enumerate the files on my disks, but I'm stuck at the great UTF frontier.

I'm using boost::recursive_directory_iterator to enumerate the files. That's works great, but Windows is set to "french canada" and many files and directories have french characters (likes é, è, ç). These filenames are not displayed correctly on the screen and I'm using wcout. I see a '▒' instead of the acute chars. Even boost::filesystem::ifstream is unable to open these files.

I tried to add "std::locale::global(std::locale(""))", but at first that only thrown an exception. I have found that when LANG is set to "" while executing the program, the previous command does not throw any more, but it only set the "C" locale instead of being the one use by the OS (which I expect to be "fr_CA.UTF-8" or "fr_CA.ISO8859-1"). Any other value for LANG bring the exception back...

What must be done to have a cygwin gcc program usable in an i18n world?

I have write this to test various locale ID:

#include <iomanip>
#include <iostream>
#include <locale>

using namespace std;

void tryLocale(string ID)
{
    try{
        cout << "Trying " << std::setw(18) << std::left << "\"" + ID + "\" ";

        std::locale Loc(ID.c_str());
        cout << "OK (" << Loc.name() << ")" << endl;
    }catch(...){
        cout << "FAIL" << endl;
    }
}

const char *Locales[] = { "", "fr", "fr_CA", "fr_CA.UTF-8", "fr_CA.ISO8859-1", "C", 0};

int main()
{
    cout << "Classic = " << std::locale::classic().name() << endl << endl;

    int i = 0;
    do
    { tryLocale(Locales[i]);
    } while(Locales[++i]);

    return 0;
}

And that gives me this output (without any LANG or LC_ALL):

Classic = C

Trying ""                FAIL
Trying "fr"              FAIL
Trying "fr_CA"           FAIL
Trying "fr_CA.UTF-8"     FAIL
Trying "fr_CA.ISO8859-1" FAIL

With LANG set to "", the first "trying" becomes

Trying ""                OK (C)

The exception thrown print this:

terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid
PRouleau
  • 542
  • 4
  • 12
  • I'm still looking for the right reason, but it looks like it is a combinaison of things: a) the unicode is not enough, a console always needs a codepage/charset; b) wcout does not handle UTF-16, only UTF-8 (I lost the link); c) the good old dir.h stuff works just fine, with simple char* (aka multibyte aka utf-8?). So it looks like I will have to undo the unicode encoding done by boost::filesystem if I want to use it. – PRouleau Oct 21 '11 at 03:38
  • [Another puzzle part](http://stackoverflow.com/questions/379240/is-there-a-windows-command-shell-that-will-display-unicode-characters) – PRouleau Oct 22 '11 at 02:12
  • For me, the cygwin code seems to always display utf-8 on console correctly (i.e. even with "C" locale). Still throwing on "" sounds very wrong; "" is supposed to mean the user's preferred locale and should always result in valid locale object, the same as "C" if no other can be created. But that does not appear to be cygwin-specific; the mingw (using cygwin mingw compiler) build crashes the same for me. – Jan Hudec Feb 11 '13 at 10:02

0 Answers0