2

I am trying to find a more efficient way to read an entire file into a vector of lines, defined as std::vector<std::string>.

Currently, I have written the naive:

std::ifstream file{filepath};
std::vector<std::string> lines;

std::string line;
while(std::getline(file, line)) lines.push_back(line);

But feel as though the extra copy in push_back and the vector reallocation for every line would be extremely detrimental to efficiency and am looking for a more modern c++ type of approach, such as using stream buffer iterators when copying bytes:

std::ifstream file{filepath};
auto filesize = /* get file size */;
std::vector<char> bytes;
bytes.reserve(filesize);
bytes.assign(std::istreambuf_iterator{file}, istreambuf_iterator{});

Is there any such way I could read a text file in by lines into a vector?

Community
  • 1
  • 1
RamblingMad
  • 5,332
  • 2
  • 24
  • 48
  • The `push_back` won't copy if you use `std::move(line)`. – Jonathan Potter Oct 06 '15 at 04:55
  • @JonathanPotter Using an object again after a `move`? Don't like that. I'd use `emplace_back`, though. – RamblingMad Oct 06 '15 at 05:07
  • the stream iterator approach is very slow in my tests. For maximum speed I would try reading the entire file into a single pre-allocated vector and then split it up into individual, strings from there. – Galik Oct 06 '15 at 05:11
  • @CoffeeandCode `move` is guaranteed to leave the moved-from object in a consistent state, so I don't see a problem. – Jonathan Potter Oct 06 '15 at 05:22
  • I think `std::move` is fine if the following operation is an *assignment*. The main problem with `std::move` is that it empties the string so it has to re-allocate every line. If you don't move you have a copy but it becomes less and less likely you will need to re-allocate. – Galik Oct 06 '15 at 05:25
  • 1
    @Galik: Either you re-alloc the old string or you alloc the new one in the vector, not much you can do about that. At least you're not copying the actual chars. – Jonathan Potter Oct 06 '15 at 05:41
  • @JonathanPotter `std::move` doesn't guarantee anything, it's a glorofied cast. And I have no idea what the `basic_string&&` overload of `std::getline` is doing behind the scenes, so I don't wanna risk moving to another platform and my code not working :/ – RamblingMad Oct 06 '15 at 07:45
  • @CoffeeandCode It's the *move assignment operator* and *move constructor* that guarantee to leave the original object in a consistent state. This is a crucial part of move semantics. Think about what would happen on destruction if it didn't. – Jonathan Potter Oct 06 '15 at 09:47

2 Answers2

0

There is very interesting and relative new approach - ranges. You can read very interesting articles by Eric Niebler:

Out Parameters, Move Semantics, and Stateful Algorithms about fast getlines alternative

Input Iterators vs Input Ranges

SergV
  • 1,269
  • 8
  • 20
0

Something along the following code might work.

struct FileContents
{
   // Contents of the file.
   std::string contents;

   // Each line consists of two indices.
   struct line { std::string::size_type begin; std::string::size_type end;};

   // List of lines.
   std::vector<line> lines;
};

void readFileContents(std::string const& file,
                      FileContents& fileContents)
{
   std::ifstream ifs(file);
   if ( !ifs )
   {
      // Deal with error.
      return;
   }

   // Get the size of the file.
   ifs.seekg(0, std::ios::end);
   std::fstream::pos_type size = ifs.tellg();

   // Reserve space to read the file contents.
   fileContents.contents.assign(static_cast<std::string::size_type>(size), '\0');

   // Read the file contents.
   ifs.seekg(0, std::ios::beg);
   ifs.read(&fileContents.contents[0], size);
   ifs.close();

   // Divide the contents of the file into lines.
   FileContents::line line = {0, 0};
   std::string::size_type pos;
   while ( (pos = fileContents.contents.find('\n', line.begin)) != std::string::npos )
   {
      line.end = pos+1;
      fileContents.lines.push_back(line);
      line.begin = line.end;
   }
   line.end = fileContents.contents.size();
   if ( line.begin < fileContents.contents.size() )
   {
      fileContents.lines.push_back(line);
   }
}
R Sahu
  • 204,454
  • 14
  • 159
  • 270