3

At the moment I read the same file twice, because I need two different representations: (a) a raw byte sequence without any conversion, (b) a text representation with bytes being converted into the current execution character set. Basically, the code looks like that:

using namespace std;
const char* fileName = "test.txt";

// Part 1: Read the file unmodified byte-per-byte
string binContent;
ifstream file1( fileName, ifstream::binary );
while( true ) {
    char c;
    file1.get( c );
    if( !file1.good() ) break;
    binContent.push_back( c );
}

// Part 2: Read the file and convert the character code according to the
// current locale from external character set to execution character set
wstring textContent;
wifstream file2( fileName );
wifstream.imbue( locale("") );
while( true ) {
    wchar_t c;
    file2 >> c;
    if( !file2.good() ) break;
    textContent.push_back( c );
}

Obviously, the code reads the same file twice. I would like to avoid this and directly convert binaryContent to textContent in memory.

Please note that this is more than just a plain char to wchar_t conversion, because it also might involve a true character conversion if the character encoding of the current locale locale("") differs from the executation character encoding. Such a conversion might although be necessary, even if textContent was a narrow character string, too.

In the example above, the magic of character conversion within part 2 happens in template<typename _CharT, typename _Traits> bool basic_filebuf<_CharT, _Traits >::_M_convert_to_external( _CharT* __ibuf, streamsize __ilen ) in fstream.tcc and involves using the codecvt facet of the locale.

I was hoping for a way to construct a wistringstream object from the binContent object instead of a wifsteam and then imbuing the wistringstream with the proper locale. But this does not seem to work, because all constructors of wistringstream already expect wide character nor does wistringstream seem to implement the conversion logic of wifstream.

Is there any better way (i.e. more concise and less error-prone) way than using codecvt manually?

user2690527
  • 1,729
  • 1
  • 22
  • 38

0 Answers0