0

I am trying to write a reasonably efficient file read-in routine. My data file is a text file with several "frames". Each frame has 2 header-lines and a number of items, as follows

<int "nitems">
<float> <float> <float>
<string1> <float> <float> <float>
<string2> <float> <float> <float>
...
<string-nitems> <float> <float> <float>

My current implementation uses fstream to retrieve numbers but seems horribly slow. My test file contains about 200 frames of 10.000 lines each (~75 Mb) which takes 2.5 seconds to read!

int loadframe() {
  _file >> _nat;
  _file >> _cell[0] >> _cell[1] >> _cell[2];
  for(int i=0,k=0;i<_nat;i++) {
    _file >> _types[i] >> _pos[k++] >> _pos[k++] >> _pos[k++]; // this line !!!
  }   
  return 0;
}

_file is an ifstream (opened elsewhere), _types is a vector of strings, _cell and _pos are vectors of doubles.

Does anyone have suggestions how to speed this up?

Thanks.

Scrrenshot of kcachegrind output (for valgrind's callgrind tool)

Update 1

Rewriting this part with fscanf reduced the time from ~2.5 seconds to ~1.8 seconds: about 30% gain, not bad. _f is now of type FILE* _f = fopen(filename,"r") object. The lines below fscanf are for the casting (if needed) but do not take up any significant time as can be seen when commenting them out.

int loadxyz() {
  char c[16];
  float x0,x1,x2;
  fscanf(_f,"%d",&_nat);
  fscanf(_f,"%f%f%f",&x0,&x1,&x2;
  _cell[0]=x1; _cell[1]=x2; _cell[2]=x3;
  for(int i=0, k=0;i<_nat;i++,k+=3) {
    fscanf(_f,"%s%f%f%f",&c,&x0,&x1,&x2);
    _types[i]=c; _pos[k]=x0; _pos[k+1]=x1; _pos[k+2]=x2;
  }
  return 0;
}

Update 2

Based on suggestions below I wrote a small benchmark program, which shows that Nim's solution to clearly fastest. Compiler optimization does not have any significant effect in my case. For anyone who wants to try, I added the source below. A recent compiler is needed g++ -std=c++11 readtest.cpp -o readtest.

Thanks! If anyone has yet other suggestions I would be more than happy to add/benchmark them.

The result (testfile is ~32Mb)

$ ./readtest 
write                 : took 1.97 seconds
check = 549755289600.00
read1 (ifstream)      : took 1.10 seconds
check = 549755289600.00
read2 (fscanf)        : took 0.64 seconds
check = 549755289600.00
read3 (stream+strtod) : took 0.41 seconds

Here is the source of readtest.cpp:

#include <stdio.h>    // printf, fopen, fclose, fprintf, 
#include <stdlib.h>   // strtod 
#include <fstream>    // ifstream
#include <string>     // string
#include <ctime>      // clock

#define N 1048576 // 1024*1024 number of lines

using namespace std;

void write(string name) {
  FILE* f = fopen(name.c_str(),"w");
  for(float i=0;i<N;i++) 
    fprintf(f,"%s %.2f %.2f %.2f\n","x",i,i,i); // write some formatted data 
  fclose(f);
}

void read1(string name) {
  double num,check=0;
  string s;
  ifstream f(name);
  for(int i=0;i<N;i++) {
    f >> s;
    f >> num;
    f >> num;
    f >> num;
    check+=num;
  }
  printf("check = %.2f\n",check);
  f.close();
}

void read2(string name) {
  double num,check=0;
  char c[16];
  string s;
  FILE* f=fopen(name.c_str(),"r");
  while(fscanf(f,"%s%lf%lf%lf",&c,&num,&num,&num)!=EOF) {
    s = c;
    check+=num;
  }
  printf("check = %.2f\n",check);
  fclose(f);
}

void read3(string name) {
  string line, s;
  double num,check=0;
  ifstream f(name);
  while(getline(f,line)) {
    size_t start = line.find_first_not_of(" \t");
    size_t pos = line.find(" ");
    char* c = &*(line.begin() + pos + 1); 
    s = line.substr(start,pos+1); 
    num = strtod(c+start, &c);
    num = strtod(c, &c);
    num = strtod(c, &c);
    check+=num;
  }
  printf("check = %.2f\n",check);
  f.close();
}

int main() {
  clock_t start, end;
  string name("testfile.dat");

  start = clock();
  write(name);
  end   = clock();
  printf("write                 : took %.2f seconds\n",double(end-start)/CLOCKS_PER_SEC);

  start = clock();
  read1(name);
  end   = clock();
  printf("read1 (ifstream)      : took %.2f seconds\n",double(end-start)/CLOCKS_PER_SEC);

  start = clock();
  read2(name);
  end   = clock();
  printf("read2 (fscanf)        : took %.2f seconds\n",double(end-start)/CLOCKS_PER_SEC);

  start = clock();
  read3(name);
  end   = clock();
  printf("read3 (stream+strtod) : took %.2f seconds\n",double(end-start)/CLOCKS_PER_SEC);
}
jaap
  • 5,661
  • 2
  • 20
  • 25
  • 1
    Do you compile with optimization turned on? – Dietmar Kühl Oct 29 '13 at 23:23
  • 1
    `fscanf` is faster if you are harsh about the speed – SwiftMango Oct 29 '13 at 23:23
  • @DietmarKühl, yes I compile with g++ -O3 (in fact this turned out to make a significant difference). – jaap Oct 29 '13 at 23:26
  • @texasbruce: It depends on the implementation but I agree that all commercial implementations I'm aware of have a faster `fscanf()` than formatted input. – Dietmar Kühl Oct 29 '13 at 23:28
  • @texasbruce I would have no trouble "falling back" to a C feature as fscanf, but the casting really seems the bottle-neck here. – jaap Oct 29 '13 at 23:29
  • 2
    I just had a more thorough look at the problematic line: It has undefined behavior anyway as `k` gets incremented multiple times without intervening sequence points (probably this isn't the performance problem, though). – Dietmar Kühl Oct 29 '13 at 23:30
  • I doubt there is anything easy you could do here. If you truly wanted to make reading faster, convert the data to a binary representation. – Nim Oct 29 '13 at 23:31
  • @DietmarKühl k is not the counter here, it just permits to run trough _pos (which is of size 3*n). – jaap Oct 29 '13 at 23:33
  • 2
    @jaap: `k` is of type `int` and incremented without intervening sequence point more than once. Doing so results in undefined behavior. The use is morally equivalent to `k += ++k;` or `++k + ++k` and the like. They all result in undefined behavior! – Dietmar Kühl Oct 29 '13 at 23:38
  • @DietmarKühl Ah I see, thanks. Of course I can easily fix it by writing each stream on one line or writing explicitly k,k+1,k+2. And it's not related to the performance issue as you noted. – jaap Oct 30 '13 at 00:00
  • 1
    I think the bottleneck may not be ifstream, but may be your hard drive. 75MB taking 2 seconds is normal for a HDD hard drive. – SwiftMango Oct 30 '13 at 00:29
  • if you want to be faster than both ifstream and fscanf/strtod, there is boost.spirit – Cubbi Oct 30 '13 at 00:37
  • @Cubbi, proverbial tank meet fly.. :) My only concern with spirit is it's requirement for a `forward_iterator_tag`, as in, it cannot read from the stream directly, instead you have to read into a `string` and then pass those iterators. Their stream iterator implementation is horrendously slow! – Nim Oct 30 '13 at 10:08
  • @Nim istreambuf_iterator isn't all that bad (no virtual calls unless the buffer ends), but yes, to compete against strtod, both have to operate on a memory-mapped file (or file that's read into a string/array). OP: Can you post a test file or a piece of code to build one somewhere? – Cubbi Oct 30 '13 at 10:26
  • @texasbruce it's true that I am approaching some limit. A word-count on the same file (`time wc -l testfile`) takes ~1.0 seconds. However I don't think it's the file read but more the word extraction and castings, grep (`grep x testfile`) is lightning fast as usual and take as little as 0.02 seconds (5Gb/s??). – jaap Oct 30 '13 at 11:47
  • @Nim my test code should write a typical file. I'll have a look at spirit, if someone has a ready-to-go code would be nice. – jaap Oct 31 '13 at 01:24

2 Answers2

3

You could call strtod directly. Firstly read in the entire line in your string field using getline. Then find the first space, remember the location. From this point call strtod directly three times, then trim the string to the first space. Not pretty, but could be quicker.

Something like this:

auto& s = _types[i];
std::getline(_file, s);
auto pos = s.find(" ");
auto c = &*(s.begin() + pos + 1); // data() returns a const, this workaround should do the trick
_pos.push_back(std::strtod(c, &c));
_pos.push_back(std::strtod(c, &c));
_pos.push_back(std::strtod(c, nullptr)); // last one doesn't need it

BTW: your code is slightly inefficient..

while(getline(f,item)) {
  size_t start = item.find_first_not_of(" \t"); // skip initial whitespace
  size_t pos = item.find(' ', start); // use the above find...
  char* c = &*(item.begin() + pos + 1);
  num = strtod(c, &c);
  num = strtod(c, &c);
  num = strtod(c, NULL); // last one we don't care..
  item.erase(0, start);  // trim the white space at start
  item.erase(pos);       // trim only to the first word that you want...
  check+=num;
}

The above should be a little bit more efficient...

Nim
  • 33,299
  • 2
  • 62
  • 101
  • From a minimal example however, I find that it is true that strtod should be much faster (10x) than what follows from the profiler. I will get back to this. – jaap Oct 30 '13 at 01:19
  • You should also call reserve on the vectors to prevent allocations through the run... – Nim Oct 30 '13 at 10:09
  • Thanks. For the vectors, I actually do a resize somewhere after reading in the size. I guess I could gain something by using reserve and then push_back but the cost of the resize is neglible. Just a question, to compile this I needed 'g++ -std=c++11 -fpermissive` and still I get some warnings about `invalid conversion from ‘const char**’ to ‘char**’`, is there more standard way around this? – jaap Oct 30 '13 at 14:07
  • Could you perhaps explain in a bit more detail how this works? I updated my post and included a (bastardized?) variant of your code. As far as I see, the pointer 'c' is updated during strtod which is why it jumps to the next item every time. Is that correct? That's pretty clever anyway. It also would be nice to get rid of the compiler warnings if possible. – jaap Oct 30 '13 at 15:30
  • You only need c++11 because I am using `std::strtod` and `auto`, there are variants available with c++03 too. See updated answer, and you are correct on `strtod`, the function sets the pointer pointed to by the second argument to point to the character past the last character parsed. – Nim Oct 30 '13 at 15:41
2

Off the top of my head, you can try using a larger buffer in your ifstream and other methods (synchronization, locale handling and increasing the optimization level as well as other gcc flags) as discussed here:

https://stackoverflow.com/a/5168384/866930

Or, alternatively and controversially, use fscanf.

Community
  • 1
  • 1
jrd1
  • 10,358
  • 4
  • 34
  • 51
  • 2
    Looking at the profile, it seems that `fscanf()` seems, indeed, to be the better option with the above setup: the time spend in `std::num_get<...>` is far too much compared to the time spend in `strtod()` where the actual conversion happens. There are probably ways the implementation of `std::num_get<...>` could be improved in the used library. – Dietmar Kühl Oct 29 '13 at 23:34
  • @DietmarKühl thanks, I added the profile also because I have little experience interpreting this. As I understand from your comment fscanf should at least allow me reduce this 37% (spend in std:num_get<...>) to something closer to zero. – jaap Oct 29 '13 at 23:41
  • I tried the buffer but didn't find any significant improvement. – jaap Oct 30 '13 at 00:38