I would like to use boost::algorithm::knuth_morris_pratt
over some huge files (serveral hundred gigabytes). This means I can't just read the whole file into memory nor mmap
it, I need to read it in chunks.
knuth_morris_pratt
operates on an iterator, so I guess it is possible to make it read input data "lazily" (on-demand), it would be a matter of writing a "lazy" iterator for some file access class like ifstream
, or better istream
.
I would like to know if there is some adapter available (already written) that adapts istream
to Boost's knuth_morris_pratt
so that it won't read all file data all at once?
I know there is a boost::spirit::istream_iterator
, but it lacks some some methods (like operator+
), so it would have to be modified to work.
On StackOverflow there's a implementation of bidirectional_iterator
here, but it still requires some work before it can be used with knuth_morris_pratt
.
Are there any istream
iterators that are already written, tested and working?
Update: I can't do mmap
, because my software should work on multiple operating systems, both on 32-bit and 64-bit architectures. Also very often I don't have the files anyway, they're being generated on-the-fly, that's why I search for a solution that involves streams.