1

I made this method to read from a file and put it into a vector of strings;

std::vector<std::string> read_file_lines1(const char* filepath){
    std::vector<std::string> file;
    std::ifstream input(filepath);
    Timer timer;
    float time = 0;
    std::string line;
    int i = 0;
    while (getline(input, line)){
        timer.reset();
        file.push_back(line);
        time += timer.elapsed();
        if (i == 10000)
            std::cout << "10000 done" << std::endl;
        i = ((i + 1) % 10001);
    }
    std::cout << time << std::endl;;
    return file;
}

But the performance was really bad in my opinion (200k lines in ~22 seconds)

with a small change making it a vector<string*> (using file.push_back(new std::string(line)) pushback calls went from ~16 seconds to ~1.2 seconds what was a huge improvement (still behind my goals) and it has a small disadvantage: memory usage; if I want to clear the memory used here I will have to remember to make a loop to clear each string*

Now it takes 6~seconds for the whole method, ~5 of them are mostly used in string in the "getline" method and I would really like to know how to optimize it or make an alternative.

PS: I am doing this do load a 3D model, using the same model in Java it takes ~0.8 seconds to read everything AND FILTER (putting "each line in the" vertex/texture... array and then putting them in the index order), so I'm really disappointed if I take that much time to read each line from a file in c++ (using debug mode in both java/c++, that probably makes it quite a bad benchmark but I'm still really disappointed);

Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274
Pedro David
  • 377
  • 3
  • 11
  • 1
    *"using debug mode"* -- What's the point of benchmarking non-optimized code? – Benjamin Lindley Aug 11 '15 at 20:54
  • Could you please point me to where my code isn't optimized? That was what I wanted since I'm unsatisfied with the performance edit: sorry didn't saw the "using debug mode". I know it's not the best (not even good) way to do it, but I was just comparing it with my java experience – Pedro David Aug 11 '15 at 20:56
  • 2
    *"using debug mode"* results in non-optimized code. – Benjamin Lindley Aug 11 '15 at 20:57
  • Which C++ version? Move semantics can help a lot here. – sbabbi Aug 11 '15 at 20:58
  • 1
    He means you should benchmark the built program in release mode, instead. – jaggedSpire Aug 11 '15 at 20:58
  • 1
    *"using debug mode"* means *"Please don't make this code run fast!"*. – Bo Persson Aug 11 '15 at 21:00
  • Your C++ compiler and your Java compiler are written by different teams, with different ideas about what should be added to the code in order to make debugging easier. Not to mention the differences between the languages themselves. Your comparison between these two things is not very useful. – Benjamin Lindley Aug 11 '15 at 21:04
  • That printing to cout will also take some time. – Daniel Jour Aug 11 '15 at 21:13
  • I know it's a "bad benchmark" comparing debug c++ compiler vs Java compiler, I was just really disappointed with the performance and wanted to know if someone had a better/faster way of doing this or if I was doing something really wrong – Pedro David Aug 11 '15 at 21:42
  • debug code could be hundred times slower then release, you also could add sleep calls into your code, so you could be disappointed with performance even more. – ISanych Aug 11 '15 at 21:49
  • I don't really know how to check the version but __cplusplus returns 199711L so C++98? (should i change it? how?) – Pedro David Aug 11 '15 at 21:52
  • 1
    Possible duplicate of [Reading line from text file and putting the strings into a vector?](http://stackoverflow.com/questions/8365013/reading-line-from-text-file-and-putting-the-strings-into-a-vector) – TheArchitect May 15 '17 at 04:47

1 Answers1

2

Main reason why it is slow, that you need to reallocate memory and move all strings into new location each time when vector capacity is reached. Use std::deque instead of vector, deque doesn't reallocate memory, it adding new chunks. Or you could preallocate vector with reserve method, to avoid reallocations.

Also debug c++ code could be much slower than release, especially with a lot of template and/or inline code - you really need to measure release performance and you need to use timer just once for whole loop as I suspect that in release mode you will be spending a lot of time in timer code.

Another small optimization. instead of

    if (i == 10000)
        std::cout << "10000 done" << std::endl;
    i = ((i + 1) % 10001);

use:

    if (i == 10000)
    {
        std::cout << "10000 done" << std::endl;
        i = 0;
    }
    ++i;
ISanych
  • 21,590
  • 4
  • 32
  • 52
  • Thanks a lot. I already tested without the cout and the difference was negligible. Will try the deque. I tried the reserve() but usually it made it even slower (don't know why). Also despite still not being "happy" with the performance of the push_back it's much better now pushing pointers than strings. My main concern now is the "getLine()" that is taking almost 5 seconds, mostly because of the way it appends the line (most time is spent in "string+=" calls so I wanted to know if there is a better alternative – Pedro David Aug 11 '15 at 21:35
  • testing performance of debug version makes no sense, try release version first. there is no issue with cout in your code, as you don't do it often, but putting timer work outside of loop will help with performance (again, of release version). pointers help with speed because it is much faster to move pointers, but you could get rid of moving at all. – ISanych Aug 11 '15 at 21:46
  • I could get rid of moving at all? Could you expand on that a bit? – Pedro David Aug 11 '15 at 21:58
  • std::deque doesn't reallocate memory on push_back - when there is no reserved space left, new chunk of memory is allocated, memory is not contiguous, but in most cases it is ok. just replace vector with deque and everything else should work in the same way. – ISanych Aug 11 '15 at 22:04
  • deque helped even more so thanks a lot. getline still takes a lot of time in debug (~5seconds) so I would like to know if there is any better option but in release all the process takes like half-second now so it seems fine. Also, with deque I'm using instead of because the performance is roughly the same but I don't have to remember to clear memory after what is really cool – Pedro David Aug 11 '15 at 23:22