c++ reading ints and float from file

Question

I have a project for school where I have a *.txt file with ~2M lines (~42MB) and each line contains row number, column number and value. I am parsing these into three vectors (int, int, float) but it takes around 45sec to complete. And I am looking for some way to make it faster. I guess the bottleneck is the iteration through every element and it would be better to load one chunk of rows/columns/values and put them into a vector at once. Unfortunately, I do not know how to do that, or if its even possible. Also I would like to stick to STL. Is there a way I could make it faster?

Thanks!

file example (first line has the count of rows, columns and non-zero values):

1092689 2331 2049148
1 654 0.272145
1 705 0.019104
2 245 0.812118
2 659 0.598012
2 1043 0.852509
2 1147 0.213949

For now I am working with:

void LoadFile(const char *NameOfFile, vector<int> &row, 
    vector<int> &col, vector<float> &value) {
    unsigned int columns, rows, countOfValues;
    int rN, cN;
    float val;
    ifstream testData(NameOfFile);
    testData >> rows >> columns >> countOfValues;
    row.reserve(countOfValues);
    col.reserve(countOfValues);
    value.reserve(countOfValues);

    while (testData >> rN >> cN >> val) {
        row.push_back(rN);
        col.push_back(cN);
        value.push_back(val);
    }
testData.close();
}

"I guess the bottleneck is the iteration through every element and it would be better if it could load one chunk of rows/columns/values and put them into a vector". Heard about `getline()`? Even though, I'm not sure if it will make the code faster — Fureeish, Apr 24 '17 at 20:46
You may be able to speed up your program by using one vector of one structure. Create a structure containing the X, Y, and count of values. Then create a vector of this structure. — Thomas Matthews, Apr 24 '17 at 20:47
Profile. Measure the performance, don't guess. Use a profiling tool to show where your program is spending the most time. — Thomas Matthews, Apr 24 '17 at 20:48
@kayleeFrye_onDeck I think you missed that OP is preallocating space in the vectors. Arrays will be no faster. — stark, Apr 24 '17 at 20:52
http://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better — stark, Apr 24 '17 at 20:55
at first, make vector of structures, where structure contain row, col and value — devalone, Apr 24 '17 at 20:58
@devalone Nonsense. Multithreading cannot improve performance of a single iostream. — stark, Apr 24 '17 at 21:15
@stark, even if each thread will read its part of file using its buffer? — devalone, Apr 24 '17 at 21:21
@devalone Do you have a disk with multiple independent heads? No? Then I guess not. — stark, Apr 24 '17 at 22:09
@Fureeish Yes, I head about `getline()`, but by chunks I meant chunks of only rows or columns or values. — Alex, Apr 25 '17 at 18:19
@ThomasMatthews I will read something about structures. I have never worked with them. — Alex, Apr 25 '17 at 18:22

R Sahu · Answer 1 · 2017-04-24T21:01:48.280

0

Before you look for a solution to the problem, I would suggest to take some steps to figure out whether the bottleneck is reading the data from the file or filling up the vectors. To that end, I would time the following operations:

Read the data from the file and discard the data.
Use a random number generator to generate random numbers and fill up the vectors.

If the bottleneck is (1), find ways to speed up reading the data from the file.
If the bottleneck is (2), find ways to speed up filling up the vector.

Improving bottleneck of reading

Using std::istream::read to read the entire contents of the file in call and then using a std::istringstream to extract the data should lead to some improvement.

Improving bottleneck of filling up vectors

Before adding data to the vectors, reserve a large capacity, which will reduce the number of times they are resized.

If you know there are 1M lines of text, reserve 1M elements in the vectors. If the real number of items in the vectors is a bit less or bit more, it shouldn't matter too much from a performance stand point.

PS The OP is already doing that.

edited Apr 24 '17 at 21:01

answered Apr 24 '17 at 20:58

R Sahu

204,454
14
159
270

Regarding your suggestion to reserving a capacity: Op already does that. – hanslovsky Apr 24 '17 at 20:59
Pushing words into a vector is measured in nanoseconds. Reading from a file is measured in milliseconds. 6 orders of magnitude difference. – stark Apr 24 '17 at 21:00
Earlier I tried: `auto s = static_cast(ostringstream{} << testData.rdbuf()).str();` and then I used `istringstream` and `>>` to get the data into individual vectors. The file was loaded in about 7sec, but parse the data into vectors took again about 40sec. That's why I _guessed_ the bottleneck. – Alex Apr 25 '17 at 18:15
@Alex, I am not aware of any techniques that will speed up the conversion of strings to numbers in memory any faster than what you get from using the `std::istream::operatoro>>` family of functions. I wish you luck. – R Sahu Apr 25 '17 at 18:21

c++ reading ints and float from file

1 Answers1

Improving bottleneck of reading

Improving bottleneck of filling up vectors