how to print out each word and its number of occurrence from a file.txt? c++

Question

basically for this code i need a program that reads the text word by word ,all words occurring in the text are to be stored as a “Word Item”; a WordItem is a type of object that stores the word proper and its frequency in a given text, and all WordItems from a text are stored in a List so that at any time, the items are listed in alphabetical order of their word component. this is the code i need to complete

#include <iostream>
#include <fstream>
#include "WordItem.h"
#include "List.h"

using namespace std;

template void print_list (list);
template void print_lines (list);
void alpha_insert (string, list &);
string strip_punct (string);

int main ()
{
    list wdList;
    string next;

    ifstream inp;
    inp.open ("file.txt");
    inp >> next;
    next = strip_punct (next);
    while (!inp.fail ()) {
        alpha_insert (next, wdList);    // DEFINE BELOW
        inp >> next;
        next = strip_punct (next);      // DEFINE BELOW
    }
    inp.close ();

// Print out each word and its number of occurrence
// in "file.txt"; word:count pair per alpha_insert


// Iterate over the wdList and determine the word
// among all WordItems that has maximal count;


// : export this most frquent word and its count;


    return 0;
}

template void print_list (list lst)
{
    cout << endl;
    typename list::iterator itr;
    for (itr = lst.begin (); itr != lst.end (); ++itr)
        cout << *itr << " ";
    cout << endl;
}

template void print_lines (list lst)
{
    cout << endl;
    typename list::iterator itr;
    for (itr = lst.begin (); itr != lst.end (); ++itr)
        cout << *itr << " " << endl;
    cout << endl;
}

string strip_punct (string x)
{
    for (int i = x.size () - 1; i >= 0; i--) {
        if (ispunct (x[i])) {
            x.erase (i--, 1);
            i++;
        } else
            return x;
    }

    return x;
}

void alpha_insert (string x, list & wdlst)
{
    WordItem wordit;

    if (wdlst.empty ()) {
        wordit = WordItem (x);
        wdlst.insert (wdlst.begin (), wordit);
        return;
    }
    Find proper place and insert ... return;
}

To get a count of each word can be done in literally 4 or 5 lines of C++ code using `std::map`. In addition, getting rid of the punctuation for each line can be done in a single line of C++ code. If you did those two things, your code would be reduced by at least half its size. — PaulMcKenzie, Mar 01 '21 at 01:50
`next.erase(std::remove_if(next.begin(), next.end(), ispunct), next.end()));` That removes all punctuation from `next`. — PaulMcKenzie, Mar 01 '21 at 01:55

BiOS · Answer 1 · 2021-03-01T15:56:31.150

You could try with the solution below, using a hash-table to count the occurrences of each word:

Code

#include <sstream>
#include <fstream>
#include <iostream>
#include <vector>
#include <map>

//Pre-defining class to define vector below
class wordItem;

//Your "list" (vector) of wordItems
std::vector<wordItem> wordVector;

class wordItem {
    public:
    std::string word;
    int times;

    wordItem(std::string _word, int _times) {
        word = _word;
        times = _times;
        wordVector.push_back(*this);
    }
};

void generateItems()
{
    std::map<std::string, int> words;
    std::ifstream input("file.txt");

    //Read every line of your input file
    for (std::string line; getline(input, line);)
    {
        //Iterate through each word
        std::istringstream iss(line);
        std::vector<std::string> tempwords;
        std::copy(std::istream_iterator<std::string>(iss),
                  std::istream_iterator<std::string>(),
                  std::back_inserter(tempwords));
        
        //Iterate through the temp vector of words to count them through a map
        for (auto i : tempwords)
            ++words[i];
    }

    //Iterate through map items to create a wordItem
    for (auto const &x : words)
    {
        wordItem(x.first, x.second);
    }
}

struct lowerThanWord
{
    inline bool operator() (const wordItem& l, const wordItem& r)
    {
        return (l.word < r.word);
    }
};

int main()
{
    generateItems();


    //If necessary, re-sort your vector
    //std::sort(wordVector.begin(), wordVector.end(), lowerThanWord());

    //Print the attributes of objects in our wordItems list
    for (auto i : wordVector) {
        std::cout << "WORD : "
                  << i.word
                  <<" - TIMES: "
                  << i.times
                  << std::endl;
    }


}

I/O

Assuming we have a file.txt containing:

Hello I am your c plus plus code

The final output of the above main() will be:

WORD : Hello - TIMES: 1
WORD : I - TIMES: 1
WORD : am - TIMES: 1
WORD : c - TIMES: 1
WORD : code - TIMES: 1
WORD : plus - TIMES: 2
WORD : your - TIMES: 1

Note how capital letters come before small letters. If you need more personalised sorting, you could activate and use lowerThanWord() and modify it according to your needs.

Explanation

After the necessary includes, we define the class wordItem first. This is because we need to create a vector containing wordItem objects just below.

We then start shaping our wordItem class, by adding the necessary public attributes word and times. In the initializer, we specify those arguments as needed, and we automatically add the newly create object to the vector of wordVectors, with:

wordVector.push_back(*this);

Now, we start building the generateItems() method, in which we define a map, and we open the file.txt file, iterating line by line, and then word by word.

We insert each word in a new vector called tempWords, and iterate through it, for each word of the text that we obtained, we add said entry to the map as a key, with its value incremented by one. This way, if the word was not in the map yet, it would be added with value 1, otherwise it would get its value incremented by 1.

We now have a map containing each word as key and their occurrences as a value.

Finally, we iterate through each value of the map, and create a wordItem instance using the key and the value pair.

In the main function, we could sort the obtained vector according to the word value (for further details refer to this question). Note that you don't necessarily have to do it, because when we iterate throught the map, the items are automatically sorted. However, if you have modified this vector afterwards, you have use the shown method:

std::sort(wordVector.begin(), wordVector.end(), lowerThanWord());

To sort the vector again.

We then iterate through your wordVector, containing wordItem objects, and print their attributes.

`for (auto i : tempwords)` -- That loop could simply be `++words[i];`. There is no need to search for the word, as a map does not store duplicates. That's why I mentioned in the comment section that to store a table of word and count is literally just 4 or 5 lines. You added 5 or 6 unnecessary lines of code. — PaulMcKenzie, Mar 01 '21 at 12:29
@PaulMcKenzie that's a good point and I agree. I have amended my answer to take account of it. Thanks for pointing out. — BiOS, Mar 01 '21 at 12:37
OK, no problem. I know that in languages like Java, you have to implicitly check if the item exists just to add 1 or to initialize it, but in C++, that is not necessary due to the default initialization of an `int` being 0. — PaulMcKenzie, Mar 01 '21 at 13:16

how to print out each word and its number of occurrence from a file.txt? c++

1 Answers1

Code

I/O

Explanation