0

So I have several text files. I need to figure out the 10 most common characters and words in the file. I've decided to use a vector, and load it with each character from the file. However, it needs to include white space and new lines.

This is my current function

void readText(ifstream& in1, vector<char> & list, int & spaces, int & words)
{
//Fills the list vector with each individual character from the text ifle
in1.open("test1");

in1.seekg(0, ios::beg);
std::streampos fileSize = in1.tellg();
list.resize(fileSize);

    string temp;
    char ch;
    while (in1.get(ch))
    {
        //calculates words
        switch(ch)
        {
        case ' ':
            spaces++;
            words++;
            break;
        default:
            break;  
        }
        list.push_back(ch);
    }
    in1.close();
}

But for some reason, it doesn't seem to properly hold all of the characters. I have another vector elsewhere in the program that has 256 ints all set to 0. It goes through the vector with the text in it and tallys up the characters with their 0-256 int value in the other vector. However, it's tallying them up fine but spaces and newlines are causing problems. Is there a more efficient way of doing this?

1 Answers1

4

The problem with your code right now is that you're calling

list.resize(fileSize);

and use

list.push_back(ch);

in your read loop at the same time. You only need one or the other.

Omit one of them.


Is there a more efficient way of doing this?

The easiest way is to resize the std::vector <char> with the size you already know and use std::ifstream::read() to read in the whole file in one go. Calculate everything everything else from the vector contents afterwards.
Something along these lines:

list.resize(fileSize);
in1.read(&list[0],fileSize);

for(auto ch : list) {
    switch(ch) {
       // Process the characters ...
    }
}
πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190