I have a file of the following format:
1: some_basic_info_in_this_line
2: LOTS_OF_INFO_IN_THIS_LINE_HUNDREDS_OF_CHARS
3: some_basic_info_in_this_line
4: LOTS_OF_INFO_IN_THIS_LINE_HUNDREDS_OF_CHARS
...
That format repeats itself tens of thousands of times, making files up to 50 GiB+. I need an efficient way to process the only the line 2 of this format. I'm open to using C, C++11 STL, or boost. I've looked at various other questions regarding file streaming on SO, but I feel like my situation is unique because of the large file size and only needing one out of every four lines.
Memory mapping the file seems to be the most efficient from what I've read, but mapping a 50+ GB file will eat up most computers RAM (you can assume that this application will be used by "average" users - say 4-8 GiB RAM). Also I will only need to process one of the lines at a time. Here is how I am currently doing this (yes I'm aware this is not efficient, that's why I'm redesigning it):
std::string GL::getRead(ifstream& input)
{
std::string str;
std::string toss;
if (input.good())
{
getline(input, toss);
getline(input, str);
getline(input, toss);
getline(input, toss);
}
return str;
}
Is breaking the mmap into blocks the answer for my situation? Is there anyway that I can leverage only needing 1 out of 4 lines? Thanks for the help.