At the moment I read the same file twice, because I need two different representations: (a) a raw byte sequence without any conversion, (b) a text representation with bytes being converted into the current execution character set. Basically, the code looks like that:
using namespace std;
const char* fileName = "test.txt";
// Part 1: Read the file unmodified byte-per-byte
string binContent;
ifstream file1( fileName, ifstream::binary );
while( true ) {
char c;
file1.get( c );
if( !file1.good() ) break;
binContent.push_back( c );
}
// Part 2: Read the file and convert the character code according to the
// current locale from external character set to execution character set
wstring textContent;
wifstream file2( fileName );
wifstream.imbue( locale("") );
while( true ) {
wchar_t c;
file2 >> c;
if( !file2.good() ) break;
textContent.push_back( c );
}
Obviously, the code reads the same file twice. I would like to avoid this and directly convert binaryContent
to textContent
in memory.
Please note that this is more than just a plain char
to wchar_t
conversion, because it also might involve a true character conversion if the character encoding of the current locale locale("")
differs from the executation character encoding. Such a conversion might although be necessary, even if textContent
was a narrow character string, too.
In the example above, the magic of character conversion within part 2 happens in template<typename _CharT, typename _Traits> bool basic_filebuf<_CharT, _Traits >::_M_convert_to_external( _CharT* __ibuf, streamsize __ilen )
in fstream.tcc
and involves using the codecvt
facet of the locale.
I was hoping for a way to construct a wistringstream
object from the binContent
object instead of a wifsteam
and then imbuing the wistringstream
with the proper locale. But this does not seem to work, because all constructors of wistringstream
already expect wide character nor does wistringstream
seem to implement the conversion logic of wifstream
.
Is there any better way (i.e. more concise and less error-prone) way than using codecvt
manually?