1

My normal procedure is as follows:

.
.
.
// I create a wifstream from file
std::wifstream wif(L"...");

// Generate a wstring based on file content
std::wstring m_wstr((istreambuf_iterator<wchar_t>(wif)), istreambuf_iterator<wchar_t>());

// Do stuff on the content using iteraror of wstring
boost::algorithm::boyer_moore<std::wstring::const_iterator>(m_wstr);
.
.
.

The problem is that the second step takes a lot of time, since the files are huge. Total file content is loaded onto wstring, which isn't normally necessary as the boost::algorithm::boyer_moore skips reading much of it anyway.

So, skipping that step would drastically improve speed, therefore I need a random access iterator over a std::wifstream, e.g., std::wifstream::const_iterator.

How to implement one concisely?

Okan Barut
  • 279
  • 1
  • 7
  • Use `std::basic_istream::seekg` for implementing iterators. – 273K Jul 17 '21 at 16:32
  • @S.M. Do you mean through using a combination of seekg and get() ? – Okan Barut Jul 17 '21 at 16:34
  • 1
    I wrote a rather crude and buggy example of a binary search of a file here. It may be useful: https://stackoverflow.com/questions/37254910/how-to-read-a-file-backwards-to-find-substring-efficiently/37258908#37258908 – Galik Jul 17 '21 at 17:33
  • The easiest way is surely to use a memory-mapped file, although you can’t implement that on top of ``. – Davis Herring Jul 17 '21 at 23:11
  • @DavisHerring Do you mean something like this? https://github.com/NanXiao/code-for-my-blog/blob/master/2018/04/benchmark-ifstream-and-mmap/test_mmap.cpp – Okan Barut Jul 18 '21 at 16:14
  • @OkanBarut: That does demonstrate `mmap`, yes. – Davis Herring Jul 18 '21 at 18:06
  • @DavisHerring It consumes a lot of time during the first pass, but the next passes are really fast. For this reason, it is actually slower than wifstream. Because I normally only need a single pass. Is this normal for mmap though? – Okan Barut Jul 18 '21 at 23:20
  • @OkanBarut: Certainly repeated reads of the same file should be faster, and more so with `mmap` than with a stream. Why the latter would be faster in your case is a more detailed question that should be asked separately. – Davis Herring Jul 18 '21 at 23:52
  • @DavisHerring I agree. I'm running a linux WSL from a windows machine, so, that definitely is a separate question. But thanks! – Okan Barut Jul 19 '21 at 01:33
  • @OkanBarut: There is native memory-mapped access on Windows, if that matters, but I don’t know what it’s called. – Davis Herring Jul 19 '21 at 01:39

0 Answers0