You can read the entire file into memory. This can be done with C++ streams, or you may be able to get even more performance by using platform specific API's, such as memory mapped files or their own file reading API's.
Once you have this block of data, for performance you want to avoid any further copies and use it in-place. In C++17 you have std::string_view
which is similar to std::string
but uses existing string data, avoiding the copy. Otherwise you might just work with C style char*
strings, either by replacing the newline with a null (\0
), using a pair of pointers (begin/end) or a pointer and size.
Here I used string_view
, I also assumed newlines are always \n
and that there is a newline at the end. You may need to adjust the loop if this is not the case. Guessing the size of the vector
will also gain a little performance, you could maybe do so from the file length. I also skipped some error handling.
std::fstream is("data.txt", std::ios::in | std::ios::binary);
is.seekg(0, std::ios::end);
size_t data_size = is.tellg();
is.seekg(0, std::ios::beg);
std::unique_ptr<char[]> data(new char[data_size]);
is.read(data.get(), data_size);
std::vector<std::string_view> strings;
strings.reserve(data_size / 40); // If you have some idea, avoid re-allocations as general practice with vector etc.
for (size_t i = 0, start = 0; i < data_size; ++i)
{
if (data[i] == '\n') // End of line, got string
{
strings.emplace_back(data.get() + start, i - start);
start = i + 1;
}
}
To get a little more performance, you might run the loop to do CPU work in parallel of the file IO. This can be done with threads or using platform-specific async file IO. However in this case the loop will be very fast, so there would not be much to gain.