What I referred to in my comment, was that you could index an input stream just remembering the starting offsets of the lines.
The std::istream::tellg()
and std::istream::seekg()
functions allow you to navigate to arbitrary positions in an active and ready std::istream
.
Here's a working example code:
A small bunch of standard library headers involved:
#include <sstream>
#include <string>
#include <vector>
#include <iostream>
#include <algorithm>
#include <cstddef>
A here document to establish a std::istream
:
static const std::string theInput{R"inp(Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
dolore eu fugiat nulla pariatur.
06/02/18
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum.
)inp"};
The main routine to index the line starting positions, and navigate within them:
int main()
{
std::vector<std::size_t> line_positions;
std::istringstream input_stream(theInput);
std::string current_line;
std::size_t theDatePos = -1u;
// Collect all line starting positions
do {
line_positions.push_back(input_stream.tellg());
std::getline(input_stream,current_line);
if(current_line == "06/02/18") {
theDatePos = line_positions.back();
}
} while(input_stream);
// At this point the istream's eof bit is set, so to work furter
// with it we need to clear() and reset the state.
input_stream.clear();
int current_line_number = line_positions.size();
std::cout << "current_line: " << current_line_number << ". '"
<< current_line << "'" << std::endl;
if(theDatePos != -1u) {
int date_line_number = 1;
std::find_if(std::begin(line_positions),std::end(line_positions),
[&date_line_number,theDatePos](const size_t& pos) {
if(pos != theDatePos) {
++date_line_number;
return false;
}
return true;
});
std::cout << "The date '06/02/18' was found at line number "
<< date_line_number << std::endl;
}
// Jump to line 3 and read it to the current line
input_stream.seekg(line_positions[2]);
std::getline(input_stream,current_line);
std::cout << "current_line: 3. '" << current_line << "'" << std::endl;
// Jump to line 5 and read it to the current line
input_stream.seekg(line_positions[4]);
std::getline(input_stream,current_line);
std::cout << "current_line: 5. '" << current_line << "'" << std::endl;
// Jump back to line 2 and read it to the current line
input_stream.seekg(line_positions[1]);
std::getline(input_stream,current_line);
std::cout << "current_line: 2. '" << current_line << "'" << std::endl;
}
Output:
current_line: 14. ''
The date '06/02/18' was found at line number 10
current_line: 3. 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.'
current_line: 5. 'Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut'
current_line: 2. 'consectetur adipiscing elit,'
The technique pointed out above might be helpful to navigate fast within big input streams, saving a minimum of information.
Keeping all the lines as std::string
instances might be overkill.
Some nice algo abstraction left as exercise based on that model:
Provide functions that extract a single line or a range of lines from your line indexed std::istream
:
// Extract a single line based on a given line number (position)
std::string getLineAtPos
( std::istream& is, const std::vector<std::size_t>& linePositions
, std::size linePos
);
// Extract a contiguous range of lines based on a given pair of line numbers
// (.first == low, .second == high)
std::vector<std::string> getLineRange
( std::istream& is
, const std::vector<std::size_t>& linePositions
, std::pair<std::size_t,std::size_t>& lineRange
);