2

I have a c++ code written in visual studio 2010, which reads a text file ( which contains tens of thousands of floating point numbers separated by space).Code reads the text file contents and store it to a vector of floating points.My problem is , code is taking alot of time to read and copy to the vector.Is there a faster way to do this.Some thing that can be done in visual studio c++ ( using boost libraries or mmap )

vector<float> ReplayBuffer;
ifstream in;
in.open("fileName.txt");
if(in.is_open())
{
    in.setf(ios::fixed);
    in.precision(3);

   in.seekg(0,ios::end);
   fileSizes = in.tellg();

   in.seekg(0,ios::beg);
   while(!in.eof())
   {
   for(float f;in>>f;)
       ReplayBuffer.push_back(f);
   }
   in.close();
}
sarath
  • 89
  • 1
  • 2
  • 8
  • All that can be replaced with `vector ReplayBuffer{std::istream_iterator(in >> std::fixed >> std::setprecision(3)), std::istream_iterator()}`. – David G May 29 '14 at 18:23
  • The precision and the format field are not used on input. And your loop condition in the outer loop may lead to an endless loop (if for example there is a format error). – James Kanze May 29 '14 at 18:25
  • @0x499602D2 This would be the simplest way of reading the file (and will not result in an endless loop on a format error). – James Kanze May 29 '14 at 18:26
  • It would be by far the best, but unfortunately the OP is using VS2010 and did not tag C++11 – quantdev May 29 '14 at 18:27

2 Answers2

3

If you files are very big, consider memory mapped files : Boost offer an excellent library to manipulate them cross platform (you mentioned mmap which is a Posix-Unix command, and it looks like you are developing on Windows)

Also, consider reserving space in your vector to avoid dynamic reallocations ReplayBuffer.reserve(expected_final_size);

Note:

  • Do not use !in.eof() to check if you finished reading the file, it is a bad practice.
  • If you dont need fileSizes, do not compute it.
Community
  • 1
  • 1
quantdev
  • 23,517
  • 5
  • 55
  • 88
3

If the file fits in your address space, you can mmap it and then use istrstream on the resulting memory. istrstream is formally deprecated, but it's still there, and is the only standard stream that will work here. Or you can write your own memory streambuf, which might even be faster than istrstream, because you won't have to support seeking, etc. on it (although seeking on an istrstream is also a fairly trivial operation, and shouldn't impact on the rest very much).

Beyond that, every layer of abstraction generally costs something, so it will probably be even faster (although not necessarily very much so) if you loop manually, using strtod.

In all cases, converting a generic floating point into machine floating point is an expensive operation. If you know something about the values you will be seeing, and their format (e.g. no scientific notation, values in a certain range, with a maximum number of places after the decimal), it's possible to write a conversion routine that would be faster than strtod. This requires some care, but if you know that the total number of decimal digits in the number will always result in a value that will fit in an int, you can do a very rapid int conversion, ignoring the '.', and then scale it by multiplying by the appropriate floating point value (e.g. '.001' if there were 3 digits after the '.').

James Kanze
  • 150,581
  • 18
  • 184
  • 329