8

I am writing a program in C++ using VS2010 to read a text file and extract certain information from it. I completed the code using filestream and it worked well. However now I am asked to map the file to memory and use it rather than the file operations.

I am absolutely a newbie in case of memory mapping. A part of the code I have written is as follows.

boost::iostreams::mapped_file_source apifile;

apifile.open(LogFileName,LogFileSize);

if(!apifile.is_open()) 

return FILE_OPEN_ERROR;

// Get pointer to the data.

PBYTE Buffer = (PBYTE)apifile.data();

while(//read till end of the file)
{
     // read a line and check if it contains a specific word
}

While using filestream I would have used eof and getline and string::find for performing the operations. But I don't have any idea on how to do it using memory mapped file.

EDIT 1:

int ProcessLogFile(string file_name)
{
    LogFileName = file_name;

    apifile.open(LogFileName);//boost::iostreams::mapped_file_source apifile(declared globally)

    streamReader.open(apifile, std::ios::binary);//boost::iostreams::stream <boost::iostreams::mapped_file_source> streamReader(declared globally)

    streamoff Curr_Offset = 0;

    string read_line;

    int session_id = 0;

    int device_id = 0;

    while(!streamReader.eof())
    {
        \\COLLECT OFFSETS OF DIFFERENT SESSIONS
    }

    streamReader.close();
}

This function worked and i got the offsets to the required structure.

Now after calling this function, I call yet another function as follows:

int GetSystemDetails()
{   
    streamReader.open(apifile, std::ios::binary);

    string read_line;

    getline(streamReader,read_line);

    cout << "LINE : " << read_line;

    streamReader.close();
}

I don't get any data in read_line. Is that memory mapping only for a single function? How can I use the same memory mapped file across different functions?

Jackzz
  • 1,417
  • 4
  • 24
  • 53

1 Answers1

13

I agree with people questioning the use of a mmap if you just read through the file sequentially.

boost::mapped_file_source models a Device. There's two approaches to use such a Device:

  1. use it raw (using data() as you try)
  2. using a stream wrapper

1. Using the raw Device source

You can use the mapped_file_source to report the actual size (m.data()+m.size()).

Let's take a sample to count lines:

#include <boost/iostreams/device/mapped_file.hpp> // for mmap
#include <algorithm>  // for std::find
#include <iostream>   // for std::cout
#include <cstring>

int main()
{
    boost::iostreams::mapped_file mmap("input.txt", boost::iostreams::mapped_file::readonly);
    auto f = mmap.const_data();
    auto l = f + mmap.size();

    uintmax_t m_numLines = 0;
    while (f && f!=l)
        if ((f = static_cast<const char*>(memchr(f, '\n', l-f))))
            m_numLines++, f++;

    std::cout << "m_numLines = " << m_numLines << "\n";
}

You could possibly adapt this. I have several more complicated parsing examples based on memory mapped files:


2. Wrapping the source device in a istream

This gives you all the usual stream-based operations of c++ standard streams, so you can detect the end of the file like you would always:

#include <boost/iostreams/device/mapped_file.hpp> // for mmap
#include <boost/iostreams/stream.hpp>             // for stream
#include <algorithm>                              // for std::find
#include <iostream>                               // for std::cout
#include <cstring>

int main()
{
    using boost::iostreams::mapped_file_source;
    using boost::iostreams::stream;
    mapped_file_source mmap("test.cpp");
    stream<mapped_file_source> is(mmap, std::ios::binary);

    std::string line;

    uintmax_t m_numLines = 0;
    while (std::getline(is, line))
    {
        m_numLines++;
    }

    std::cout << "m_numLines = " << m_numLines << "\n";
}
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • I used the second approach and it helped me. Thanks for your answer in such a simple yet informative manner. – Jackzz Oct 09 '14 at 04:54
  • I performed this memory mapping in a function and stored certain offsets using `is.tellg` method into a structure. Later I have to make use of this offset in another function. Will it be consistent then? – Jackzz Oct 09 '14 at 06:18
  • It really seems that you wanted the raw device access. Devices are random access. Streams are (mainly) sequential IO. I'd use streams only for text processing (which is what you showed in the question). Have you looked at actually storing data structures [directly into memory mapped files with Boost Interprocess](http://www.boost.org/doc/libs/1_56_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file)? There's an example down that page. – sehe Oct 09 '14 at 06:31
  • @sehe You forgot to call mmap.close() in your examples. – Francisco Aguilera Feb 12 '17 at 22:22
  • 12
    @FranciscoAguilera It's on the last line. Long live RAII – sehe Feb 12 '17 at 22:56