0

Here is the code to find the number of matches of a string, which is input from the user, can be found in the file temp.txt. If, for example, we want love to be counted, then matches like love, lovely, beloved should be considered. We also want to count the total number of words in temp.txt file. I am doing a line by line reading here, not word by word.

Why does the debugging stop at totalwords += counting(line)?

/*this code is not working to count the words*/


#include<iostream>
#include<fstream>
#include<string>
using namespace std;

int totalwords{0};

int counting(string line){
   
   int wordcount{0};
   
    if(line.empty()){
            return 1; 
        }

        if(line.find(" ")==string::npos){wordcount++;}
        else{
            while(line.find(" ")!=string::npos){
                int index=0;
                index = line.find(" ");
                line.erase(0,index);
                wordcount++;
            }
        }
        return wordcount;
}



int main() {
    ifstream in_file;
    in_file.open("temp.txt");

    if(!in_file){
        cerr<<"PROBLEM OPENING THE FILE"<<endl;
    }
    string line{};
    int counter{0};
    string word {};
    cout<<"ENTER THE WORD YOU WANT TO COUNT IN THE FILE: ";
    cin>>word;
    int n {0};
    n  = ( word.length() - 1 );
    
    while(getline(in_file>>ws,line)){
        
        totalwords += counting(line);

        while(line.find(word)!=string::npos){
            counter++;
            int index{0};
            index = line.find(word);
            line.erase(0,(index+n));
        }
    }
    cout<<endl;
    cout<<counter<<endl;
    cout<<totalwords;

    return 0;
}
Wolf
  • 9,679
  • 7
  • 62
  • 108
  • 1
    Just for clarification, if we are looking for `dada` and find the text `dadada` – would you want to find it once or twice? Both is reasonable... – Aconcagua Feb 02 '22 at 10:20
  • 1
    Would an empty line represent a word? Wouldn't you rather want to return 0 in that case? – Aconcagua Feb 02 '22 at 10:24
  • For getting the actual question answered, wouldn't it be better to reduce the code to counting words, i.e. remove the word-of-interest part? – Wolf Feb 02 '22 at 10:26
  • You are looking for a single character – it's more efficient then to use the appropriate overload: `line.find(' ')` – There's no need to call `find` twice if you store the result of the first call in a variable. However you'd increase word count even on more than one *subsequent* whitespace (``"hello world"` – note two spaces in between the words...). You'll need a stateful loop! – Aconcagua Feb 02 '22 at 10:28
  • Modifying the string again and again is pretty inefficient – note that all subsequent characters are moved towards front each time you do so, and you possibly do so multiple times. Better: Use the second function parameter of `std::string::find` (which defaults to 0) to re-start search after the last index found, leaving the input string unchanged. You then can even accept the string by `const` reference avoiding a *then* unnecessary copy... – Aconcagua Feb 02 '22 at 10:32
  • No need for initialisation + assignment, by the way: Just initialise directly `int count = word.length() + 1;` – you don't need uniform initialisation (recommend not using at all anyway), `std::string line;` does exactly the same and reads nicer... – Aconcagua Feb 02 '22 at 10:34
  • @Wolf Referring to me? `if(line.empty()) return 1;` doesn't appear correct to me... – Aconcagua Feb 02 '22 at 10:38
  • Side note: About [`using namespace std`](https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice)... – Aconcagua Feb 02 '22 at 10:39
  • You can make things simpler by just having one while loop for read, counting the input word and total words - given your file only has whitespace (no other delimiters like comma, etc.) separating the words. – kiner_shah Feb 02 '22 at 10:41

3 Answers3

2

line.erase(0, index); doesn't erase the space, you need

line.erase(0, index + 1);
Jarod42
  • 203,559
  • 14
  • 181
  • 302
1

Your code reveals a few problems...

At very first, counting a single word for an empty line doesn't appear correct to me. Second, erasing again and again from the string is pretty inefficient, with every such operation all of the subsequent characters are copied towards the front. If you indeed wanted to do so you might rather want to search from the end of the string, avoiding that. But you can actually do so without ever modifying the string if you use the second parameter of std::string::find (which defaults to 0, so has been transparent to you...):

int index = line.find(' ' /*, 0*); // first call; 0 is default, thus implicit
index = line.find(' ', index + 1); // subsequent call

Note that using the character overload is more efficient if you search for a single character anyway. However, this variant doesn't consider other whitespace like e. g. tabulators.

Additionally, the variant as posted in the question doesn't consider more than one subsequent whitespace! In your erasing variant – which erases one character too few, by the way – you would need to skip incrementing the word count if you find the space character at index 0.

However I'd go with a totally new approach, looking at each character separately; you need a stateful loop for in that case, though, i.e. you need to remember if you already are within a word or not. It might look e. g. like this:

size_t wordCount = 0; // note: prefer an unsigned type, negative values
                      // are meaningless anyway
                      // size_t is especially fine as it is guaranteed to be
                      // large enough to hold any count the string might ever
                      // contain characters
bool inWord = false;
for(char c : line)
{
    if(isspace(static_cast<unsigned char>(c)))
    // you can check for *any* white space that way...
    // note the cast to unsigned, which is necessary as isspace accepts
    // an int and a bare char *might* be signed, thus result in negative
    // values
    {
        // no word any more...
        inWord = false;
    }
    else if(inWord)
    {
        // well, nothing to do, we already discovered a word earlier!
        // 
        // as we actually don't do anything here you might just skip
        // this block and check for the opposite: if(!inWord)
    }
    else
    {
        // OK, this is the start of a word!
        // so now we need to count a new one!
        ++wordCount;
        inWord = true;
    }
}

Now you might want to break words at punctuation characters as well, so you might actually want to check for:

if(isspace(static_cast<unsigned char>(c)) || ispunct(static_cast<unsigned char>(c))

A bit shorter is the following variant:

if(/* space or punctuation */)
{
    inWord = false;
}
else
{
    wordCount += inWord; // adds 0 or 1 depending on the value
    inWord = false;
}

Finally: All code is written freely, thus unchecked – if you find a bug, please fix yourself...

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Great solution and explanation, especially about the real power of [`std::string::find`](https://en.cppreference.com/w/cpp/string/basic_string/find). Hopefully this will help OP to solve their issue. – Wolf Feb 02 '22 at 11:29
0

debugging getting stopped abruptly

Does debugging indeed stop at the indicated line? I observed instead that the program hangs within the while loop in counting. You may make this visible by inserting an indicator output (marked by HERE in following code):

int counting(string line){
   
   int wordcount{0};
   
    if(line.empty()){
            return 1; 
        }

        if(line.find(" ")==string::npos){wordcount++;}
        else{
            while(line.find(" ")!=string::npos){
                int index=0;
                index = line.find(" ");
                line.erase(0,index);
                cout << '.';               // <--- HERE: indicator output
                wordcount++;
            }
        }
        return wordcount;
}

As Jarod42 pointed out, the erase call you are using misses the space itself. That's why you are finding spaces and “counting words” forever.

There is also an obvious misconception about words and separators of words visible in your code:

  • empty lines don't contain words
  • consecutive spaces don't indicate words
  • words may be separated by non-spaces (parentheses for example)

Finally, as already mentioned: if the problem is about counting total words, it's not necessary to discuss the other parts. And after the test (see HERE) above, it also appears to be independent on file input. So your code could be reduced to something like this:

#include <iostream>
#include <string>

int counting(std::string line) {

    int wordcount = 0;

    if (line.empty()) {
        return 1;
    }
    if (line.find(" ") == std::string::npos) {
        wordcount++;
    } else {
        while (line.find(" ") != std::string::npos) {
            int index = 0;
            index = line.find(" ");
            line.erase(0, index);
            wordcount++;
        }
    }
    return wordcount;
}

int main() {
    int totalwords = counting("bla bla");
    std::cout << totalwords;
    return 0;
}

And in this form, it's much easier to see if it works. We expect to see a 2 as output. To get there, it's possible to try correcting your erase call, but the result would then still be wrong (1) since you are actually counting spaces. So it's better to take the time and carefully read Aconcagua's insightful answer.

Wolf
  • 9,679
  • 7
  • 62
  • 108