0

I have defined a class like this:

  class myClass {
       private:
            int count;
            string name;
       public:
            myClass (int, string);
            ...
            ...
  };

  myClass::myClass(int c, string n)
  {
        count = c;
        name = n;
  }
  ...
  ...

I have also a *.txt file which in each line there is a name:

David
Jack
Peter
...
...

Now I read the file line by line and create a new object pointer for each line and store all objects in a vector. The function is like this:

vector<myClass*> myFunction (string fileName)
{
     vector<myClass*> r;
     myClass* obj;
     ifstream infile(fileName);
     string line;
     int count = 0;
     while (getline(infile, line))
     {
          obj = new myClass (count, line);
          r.push_back(obj);
          count++;
     }
     return r;
}

For small *.txt files I have no problem. However, sometimes my *.txt files contain more than 1 million lines. In these cases, the program is dramatically slow. Do you have any suggestion to make it faster?

AlirezaMah
  • 51
  • 4
  • How often do you read this text file? Only once on startup? How long will the program run? Just a few seconds or minutes, or hours or even days? A "long time" is relative, and depends on how many times you perform the "long time" operation, and how long it is compared to the rest of the runtime. – Some programmer dude Jan 08 '18 at 10:40
  • You could load less of the file into memory, and only load the parts of the file that you need as you need them – UKMonkey Jan 08 '18 at 10:40
  • Also, what are you using the data for? What is the use-case? Do you need to read it all at once? Do you have any idea beforehand about the number of entries in the file? If you do, then you could pre-allocate memory for the vector? – Some programmer dude Jan 08 '18 at 10:44
  • I read the file once on startup. For txt files with 1 million lines, it takes 15 minutes to store objects in the vector. The problem is that this is just a part of a bigger program and 15 minutes is so much for me. – AlirezaMah Jan 08 '18 at 10:46
  • Even for a slow drive on a slow system, 15 minutes just to do what you show seems excessive. Are you doing something else you don't show us? – Some programmer dude Jan 08 '18 at 10:48
  • 1
    How long does it take you to do some menial operation on the file like `wc` (Linux/UNIX)? Your C++ program shouldn't be dramatically slower than that.... – Tony Delroy Jan 08 '18 at 10:52
  • Similar thread: https://stackoverflow.com/questions/5166263/how-to-get-iostream-to-perform-better – Tony Delroy Jan 08 '18 at 11:02

2 Answers2

2

First, find faster io than std streams.

Second, can you use string views instead of strings? They are C++17, but there are C++11 and earlier versions everywhere.

Third,

myClass::myClass(int c, string n) {
  count = c;
  name = n;
}

should read

myClass::myClass(int c, std::string n):
  count(c),
  name(std::move(n))
{}

which would make a difference for long names. None for short ones due to "small string optimization".

Forth, stop making vectors of pointers. Create vectors of values.

Fifth, failing that, find a more efficient way to allocate/deallocate the objects.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
1

One thing you can do is directly move the string you've read from the file into the objects you're creating:

myClass::myClass(int c, string n)
  : count{c}, name{std::move(n)}
{ }

You could also benchmark:

myClass::myClass(int c, string&& n)
  : count{c}, name{std::move(n)}
{ }

The first version above will make a copy of line as the function is called, then let the myClass object take over the dynamically allocated buffer used for that copy. The second version (with string&& n argument), will let the myClass object rip out line's buffer directly: that means less copying of textual data but also line's likely to be stripped of any buffer as each line of the file is read in. Hopefully your allocation will normally be able to see from the input buffer how large a capacity line needs to read in the next line, and avoid any extra allocations/copying. As always, measure when you've reason to care.

You'd likely get a small win by reserving space for your vector up front, though the fact that you're storing pointers in the vector instead of storing myClass objects by value makes any vector resizing relatively cheap. Countering that, storing pointers does mean you're doing an extra dynamic allocation.

Another thing you can do is increase the stream buffer size: see pubsetbuf and the example therein.

If speed it extremely important, you should memory map the file and store pointers into the memory mapped region, instead of copying from the file stream buffer into distinct dynamically-allocated memory regions inside distinct strings. This could easily make a dramatic difference - perhaps as much as an order of magnitude - but a lot depends on the speed of your disk etc. so benchmark both if you've reason to care.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • 1
    For storing pointers, then the new C++17 [`std::string_view`](http://en.cppreference.com/w/cpp/string/basic_string_view) class would be useful. – Some programmer dude Jan 08 '18 at 10:46
  • 2
    @Someprogrammerdude: true, though they're also obliged to track the length and in some cases it may be faster to just have a pointer (or even a file offset if that can be smaller than your pointer size), and rely on newlines in the memory mapped image for delimiting the strings. For longer strings, `string_view`'s increasingly compelling. – Tony Delroy Jan 08 '18 at 10:50