0

I have a project for school where I have a *.txt file with ~2M lines (~42MB) and each line contains row number, column number and value. I am parsing these into three vectors (int, int, float) but it takes around 45sec to complete. And I am looking for some way to make it faster. I guess the bottleneck is the iteration through every element and it would be better to load one chunk of rows/columns/values and put them into a vector at once. Unfortunately, I do not know how to do that, or if its even possible. Also I would like to stick to STL. Is there a way I could make it faster?

Thanks!

file example (first line has the count of rows, columns and non-zero values):

1092689 2331 2049148
1 654 0.272145
1 705 0.019104
2 245 0.812118
2 659 0.598012
2 1043 0.852509
2 1147 0.213949

For now I am working with:

void LoadFile(const char *NameOfFile, vector<int> &row, 
    vector<int> &col, vector<float> &value) {
    unsigned int columns, rows, countOfValues;
    int rN, cN;
    float val;
    ifstream testData(NameOfFile);
    testData >> rows >> columns >> countOfValues;
    row.reserve(countOfValues);
    col.reserve(countOfValues);
    value.reserve(countOfValues);

    while (testData >> rN >> cN >> val) {
        row.push_back(rN);
        col.push_back(cN);
        value.push_back(val);
    }
testData.close();
}
Alex
  • 1
  • 2

1 Answers1

0

Before you look for a solution to the problem, I would suggest to take some steps to figure out whether the bottleneck is reading the data from the file or filling up the vectors. To that end, I would time the following operations:

  1. Read the data from the file and discard the data.
  2. Use a random number generator to generate random numbers and fill up the vectors.

If the bottleneck is (1), find ways to speed up reading the data from the file.
If the bottleneck is (2), find ways to speed up filling up the vector.

Improving bottleneck of reading

Using std::istream::read to read the entire contents of the file in call and then using a std::istringstream to extract the data should lead to some improvement.

Improving bottleneck of filling up vectors

Before adding data to the vectors, reserve a large capacity, which will reduce the number of times they are resized.

If you know there are 1M lines of text, reserve 1M elements in the vectors. If the real number of items in the vectors is a bit less or bit more, it shouldn't matter too much from a performance stand point.

PS The OP is already doing that.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Regarding your suggestion to reserving a capacity: Op already does that. – hanslovsky Apr 24 '17 at 20:59
  • Pushing words into a vector is measured in nanoseconds. Reading from a file is measured in milliseconds. 6 orders of magnitude difference. – stark Apr 24 '17 at 21:00
  • Earlier I tried: `auto s = static_cast(ostringstream{} << testData.rdbuf()).str();` and then I used `istringstream` and `>>` to get the data into individual vectors. The file was loaded in about 7sec, but parse the data into vectors took again about 40sec. That's why I _guessed_ the bottleneck. – Alex Apr 25 '17 at 18:15
  • @Alex, I am not aware of any techniques that will speed up the conversion of strings to numbers in memory any faster than what you get from using the `std::istream::operatoro>>` family of functions. I wish you luck. – R Sahu Apr 25 '17 at 18:21