I have a large file. It's code page is CP1251. I want to parse it with boost spirit. And I parse it successfully while the parser meets non-standard characters. The boost documentation says:
Wide-character versions of the memory-mapped file Devices may be defined as follows, using the template code_converter:
#include <boost/iostreams/code_converter.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
typedef code_converter<mapped_file_source> wmapped_file_source;
typedef code_converter<mapped_file_sink> wmapped_file_sink;
But should I use it? I my code I shouldn't have a sink. I suppose: my parser uses iterator from the source, code_converter converts them using code page I gave him, and send the translated chars to parser and it parses the file.
So, this is part of my code which doesn't work:
typedef boost::iostreams::code_converter<boost::iostreams::mapped_file> wmapped_file_source;
boost::locale::generator gen;
std::locale lru = gen("ru_RU.CP1251");
wmapped_file_source mmap;
mmap.imbue(lru);
mmap.open(current_task.filename);
RhAst::RhFile rh_file(this);
bool res = phrase_parse(mmap->begin(), mmap->end(), parser, space - eol, rh_file);
I tried to create my own locale object:
class LocaleRus : public std::codecvt<wchar_t, char, std::mbstate_t>
{
public:
explicit LocaleRus(size_t r = 0) : std::codecvt <wchar_t, char, std::mbstate_t> ( r )
{
}
protected:
result do_in ( state_type&, const char* from, const char* from_end, const char*& from_next, char* to, char*, char*& to_next ) const
{
const int size = from_end - from;
//::OemToCharBuff ( from, to, size );
from_next = from + size;
to_next = to + size ;
return ok;
}
result do_out ( state_type&, const char* from, const char* from_end, const char*& from_next, char* to, char*, char*& to_next ) const
{
const int size = from_end - from;
//::CharToOemBuff ( from, to, size );
from_next = from + size;
to_next = to + size ;
return ok;
}
result do_unshift ( state_type&, char*, char*, char*& ) const { return ok; }
int do_encoding () const throw () { return 1; }
bool do_always_noconv () const throw () { return false; }
int do_length ( state_type& state, const char* from, const char* from_end, size_t max ) const
{
return std::codecvt <wchar_t, char, std::mbstate_t>::do_length ( state, from, from_end, max );
}
int do_max_length () const throw ()
{
return std::codecvt <wchar_t, char, std::mbstate_t>::do_max_length ();
}
};
and use it in code:
std::locale lru(std::locale(), new LocaleRus());
But its methods don't call. So, I didn't mind that it's too hard to read a memory mapped file with a non-standard code page. What do I do incorrectly?