0

My program uses ifstream() and getline() to parse a text file in to objects that are two vectors deep. i.e vector inside vector. The inner vector contains over 250000 string objects once the text file is finished loading.

this is painfully slow. Is there an STD alternative that is more efficient than using ifstream() and getline() ?

Thanks

UPDATE:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>

using namespace std;

class Word
{
private:
    string moniker = "";
    vector <string> definition;
    string type = "";

public:
    void setMoniker(string m) { this->moniker = m; }
    void setDefinition(string d) { this->definition.push_back(d); }
    void setType(string t) { this->type = t; }
    int getDefinitionSize() { return this->definition.size(); }

    string getMoniker() { return this->moniker; }
    void printDefinition()
    {
        for (int i = 0; i < definition.size(); i++)
        {
            cout << definition[i] << endl;
        }

    }


    string getType() { return this->type; }
};

class Dictionary
{
private:
    vector<Word> Words;

public:
    void addWord(Word w) { this->Words.push_back(w); }
    Word getWord(int i) { return this->Words[i]; }
    int getTotalNumberOfWords() { return this->Words.size(); }
    void loadDictionary(string f)
    {
        const regex _IS_DEF("[\.]|[\ ]"),
            _IS_TYPE("^misc$|^n$|^adj$|^v$|^adv$|^prep$|^pn$|^n_and_v$"),
            _IS_NEWLINE("\n");

        string line;

        ifstream dict(f);

        string m, t, d = "";

        while (dict.is_open())
        {
            while (getline(dict, line))
            {
                if (regex_search(line, _IS_DEF))
                {
                    d = line;
                }
                else if (regex_search(line, _IS_TYPE))
                {
                    t = line;
                }
                else if (!(line == ""))
                {
                    m = line;
                }
                else
                {
                    Word w;
                    w.setMoniker(m);
                    w.setType(t);
                    w.setDefinition(d);
                    this->addWord(w);
                }
            }
            dict.close();
        }
    }
};



int main()
{
    Dictionary dictionary;
    dictionary.loadDictionary("dictionary.txt");
    return 0;
}
  • 2
    Show your code. 250000 is not large enough to be slow, so there must be some oter problem. – n. m. could be an AI Apr 23 '17 at 10:46
  • Looks like this: http://stackoverflow.com/questions/3002122/fastest-file-reading-in-c – granmirupa Apr 23 '17 at 10:48
  • Try `const string&` instead of `string`. Besides, "this->" is useless (it doesn't make things slower, just useless). Besides, do you compile with all the optimization flags? – user31264 Apr 23 '17 at 11:10
  • I don't really understand what you are doing... why does each word only have either a moniker, a type, or a definition, but not all 3? Doesn't that seem odd? Also, more directly relevant to perf, there is no reason for your definition to be a vector of strings, instead of just a string. A string is roughly a vector of chars so you are not nested 2 deep, but 3 deep. – Nir Friedman Apr 23 '17 at 11:10
  • **Clarify what your code does** – Shakiba Moshiri Apr 23 '17 at 12:08
  • Print out this quote from JWZ and glue it to your mirror: *Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.* – n. m. could be an AI Apr 23 '17 at 23:44

1 Answers1

0

You should reduce your memory allocations. Having a vector of vectors is usually not a good idea, because every inner vector does its own new and delete.

You should reserve() the approximate number of elements you need in the vector at the start.

You should use fgets() if you don't actually need to extract std::string to get your work done. For example if the objects can be parsed from char arrays, do that. Make sure to read into the same string buffer every time, rather than creating new buffers.

And most important of all, use a profiler.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436