HUGE .cpp file better than reading from text file?

Question

Is it a legitimate optimisation to simply create a really HUGE source file which initialises a vector with hundreds of thousands of values manually? rather than parsing a text file with the same values into a vector?

Sorry that could probably be worded better. The function that parses the text file in is very slow due to C++'s stream reading being very slow (takes about 6 minutes opposed to about 6 seconds in the C# version.

Would making a massive array initialisation file be a legitimate solution? It doesn't seem elegant, but if it's faster then I suppose it's better?

this is the file reading code:

    //parses the text path vector into the engine
    void Level::PopulatePathVectors(string pathTable)
    {
        // Read the file line by line.
        ifstream myFile(pathTable);

            for (unsigned int i = 0; i < nodes.size(); i++)
            {
                pathLookupVectors.push_back(vector<vector<int>>());

                for (unsigned int j = 0; j < nodes.size(); j++)
                {
                    string line;

                    if (getline(myFile, line)) //enter if a line is read successfully
                    {
                        stringstream ss(line);
                        istream_iterator<int> begin(ss), end;
                        pathLookupVectors[i].push_back(vector<int>(begin, end));
                    }
                }
            }
        myFile.close();
    }

sample line from the text file (in which there are about half a million lines of similar format but varying length.

0 5 3 12 65 87 n

The problem is not C++, its your function to read in the data and initialize the vector. — onit, Oct 19 '11 at 14:40
@bzlm wow THANK YOU! learn to code, huh, never thought of that one. ass. http://stackoverflow.com/questions/7809473/why-is-this-so-much-slower-in-c — Dollarslice, Oct 19 '11 at 14:41
It could be legitimate depending on your use case, but it's also brittle unless you only ever care about working with that one data set. But you can show us your file reading code, maybe there are some issues to resolve with it. — wkl, Oct 19 '11 at 14:41
There must be something else wrong. I tried with 700 * 700 lines of 5 numbers and your code took 5 seconds to load it on my machine (which is old and slow). - I wonder, if you are using VC++, could its safety features add a 10x overhead? — UncleBens, Oct 19 '11 at 15:37
@SirYakalot: Although I know where your feelings come from, onit (or I) didn't know that you had already posted another thread for optimization. Would've been nicer if you had posted this link as part of your original post... a few bad words would've been avoided. — Kashyap, Oct 19 '11 at 18:26

score 4 · Answer 1 · answered Oct 19 '11 at 14:54

4

First, make sure you're compiling with the highest optimization level available, then please add the following lines marked below, then test again. I doubt this will fix the problem, but it may help. Hard to say until I see the results.

//parses the text path vector into the engine
void Level::PopulatePathVectors(string pathTable)
{
    // Read the file line by line.
    ifstream myFile(pathTable);

    pathLookupVectors.reserve(nodes.size()); // HERE
    for (unsigned int i = 0; i < nodes.size(); i++)
    {
        pathLookupVectors.push_back(vector<vector<int> >(nodes.size()));
        pathLookupVectors[i].reserve(nodes.size());  // HERE

        for (unsigned int j = 0; j < nodes.size(); j++)
        {
            string line;

            if (getline(myFile, line)) //enter if a line is read successfully
            {
                stringstream ss(line);
                istream_iterator<int> begin(ss), end;
                pathLookupVectors[i].push_back(vector<int>(begin, end));
            }
         }
     }
     myFile.close();
}

answered Oct 19 '11 at 14:54

Dark Falcon

43,592
5
83
98

1

While you are at it, you might also move `string line;` out of the loops. - But then again, it is hard to believe those things would make the program load the file 5 minutes and 54 seconds faster (reserving has practically no effect with C++11 implementation of the G++ library with my test file of 490000 lines of numbers). – UncleBens Oct 19 '11 at 16:06
1

I'm also seeing significant improvement (>25%) from moving `stringstream ss(line);` out of the loops and replacing it with `ss.clear(); ss.str(line);` – UncleBens Oct 19 '11 at 18:56
Sorry for the newbie question, but how do I change the optimization level? Googling has been unhelpful. I actually tried your edits and there is literally 0 improvement in the time it takes (although I'm sure they are good amendments). – Dollarslice Oct 31 '11 at 13:15
@UncleBens are you saying I should literally replace the line stringstream ss(line) with those other two statements? but move it out of the loops? where should I place it? – Dollarslice Oct 31 '11 at 13:16

score 3 · Accepted Answer · answered Oct 19 '11 at 14:46

6 minutes vs 6 seconds!! must be something wrong with your C++ code. Optimize it using good old methods before you revert to such an extreme "optimization" mentioned in your post.

Also know that reading from file would allow you to change the vector contents without changing the source code. If you do it the way you mention it, you'll have to re-code, compile n link all over again.

RvdK · Answer 3 · 2011-10-19T15:03:42.253

2

Depending if the data changes. If the data can/needs to be changed (after compiletime) than the only option is to load it from textfile. If not, well I don't see any harm to compile it.

edited Oct 19 '11 at 15:03

answered Oct 19 '11 at 14:40

RvdK

19,580
4
64
107

1

... if the data changes ^after compile time^. – Andy Thomas Oct 19 '11 at 14:41
so it wouldn't be considered bad practice? I guess it would be a lot faster.. or would it? – Dollarslice Oct 19 '11 at 14:44
1

Doing so would help you to verify the assumption about the slow IO in C++. But quite likely a big problem in your code is related to the vector reallocations. (Use the reserve() member function to address it.) – jszpilewski Oct 19 '11 at 15:01

gred · Answer 4 · 2011-10-20T22:49:41.167

I was able to get the following result with Boost.Spirit 2.5:

$ time ./test input

real    0m6.759s
user    0m6.670s
sys     0m0.090s

'input' is a file containing 500,000 lines containing 10 random integers between 0 and 65535 each.

Here's the code:

#include <vector>

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/classic_file_iterator.hpp>

using namespace std;
namespace spirit = boost::spirit;
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

typedef vector<int> ragged_matrix_row_type;
typedef vector<ragged_matrix_row_type> ragged_matrix_type;


template <class Iterator>
struct ragged_matrix_grammar : qi::grammar<Iterator, ragged_matrix_type()> {

  ragged_matrix_grammar() : ragged_matrix_grammar::base_type(ragged_matrix_) {

    ragged_matrix_ %= ragged_matrix_row_ % qi::eol;
    ragged_matrix_row_ %= qi::int_ % ascii::space;

  }

  qi::rule<Iterator, ragged_matrix_type()> ragged_matrix_;
  qi::rule<Iterator, ragged_matrix_row_type()> ragged_matrix_row_;

};

int main(int argc, char** argv){

  typedef spirit::classic::file_iterator<> ragged_matrix_file_iterator;

  ragged_matrix_type result;
  ragged_matrix_grammar<ragged_matrix_file_iterator> my_grammar;
  ragged_matrix_file_iterator input_it(argv[1]);

  qi::parse(input_it, input_it.make_end(), my_grammar, result);

  return 0;

}

At this point, result contains the ragged matrix, which can be confirmed by printing its contents. In my case the 'ragged matrix' isn't so ragged-it's a 500000 x 10 rectangle-but it won't matter because I'm pretty sure the grammar is correct. I got even better results when I read the entire file into memory before parsing (~4 sec), but the code for that is longer and it's generally undesirable to copy large files into memory in their entirety.

Note: my test machine has an SSD, so I don't know if you'll get the same numbers I did (unless your test machine has an SSD as well).

HTH!

score 0 · Answer 5 · answered Oct 19 '11 at 15:06

I wouldn't consider compiling static data into your application to be bad practice. If there is little conceivable need to change your data without a recompilation, parsing the file at compile time not only improves runtime performance (since your data have been pre-parsed by the compiler and are in a usable format at runtime), but also reduces risks (like the data file not being found at runtime or any other parse errors).

Make sure that users won't have need to change the data (or have the means to recompile the program), document your motivation and you should be absolutely fine.

That said, you could make the iostream version a lot faster if necessary.

score 0 · Answer 6 · answered Oct 19 '11 at 15:26

using a huge array in a C++ file is a totally allowed option, depending on the case.

You must consider if the data will change and how often.

If you put it in a C++ file, that means that you will have to recompile your program each time the data change (and distribute it to your customers each time !) So that wouldn't be a good solution if you have to distribute the program to other people.

Now if a compilation is allowed for every data change, then you can have the best of two worlds : just use a small script (for example in python or perl) which will take your .txt and generate a C++ file, so the file parsing will only have to be done one time for each data change. You can even integrate this step in your build process with automatic dependency management.

Good luck !

Jonas B · Answer 7 · 2011-10-19T15:52:29.730

-3

Don't use the std input stream, it's extremely slow. There are better alternatives.

Since people decided to downvote my answer because they are too lazy to use google, here:

http://accu.org/index.php/journals/1539

edited Oct 19 '11 at 15:52

answered Oct 19 '11 at 14:53

Jonas B

2,351
2
18
26

1

I don't personally know about the performance of the C++ input streams but even if it is bad, you need to specify what alternatives are available. – Dark Falcon Oct 19 '11 at 15:01
Are people on this site incapable of using google? Or stackoverflow's own search? There are already tons of questions like this on stackoverflow and they all recommend that you use another io solution. Just run your own test, like c++'s stream vs c's stream and i bet that something that takes 10 minutes in c++ takes 20 seconds i c. My answer is the perfect solution to his question. – Jonas B Oct 19 '11 at 15:47
1

@JonasB: Admittedly, C++ streams do have some (minor) overhead, but that is only natural for what they are doing. To be fair though, one should say "something that takes 10 mins in C++ takes 20 seconds _if the person writing the code is a C programmer who does not know C++_". Properly using the language, such as e.g. by calling `reserve` before inserting a few million items into a container helps to avoid such misconceptions about bad C++ performance. Proper C++ is at least as fast and often faster than C _doing the same thing_. But of course you must compare the same thing. – Damon Oct 19 '11 at 22:41

HUGE .cpp file better than reading from text file?

7 Answers7