-1

I wrote a C++ program that reads a text file. I want the program to count the number of times a word appears, however. For example, the output should look as follows:

Word Frequency Analysis

Word          Frequency
I                1
don't            1
know             1
the              2
key              1
to               3
success          1
but              1
key              1
failure          1
is               1
trying           1
please           1
everybody        1

Notice how each word appears only once. What do I need to do in order to achieve this effect??

Here is the text file (i.e. named BillCosby.txt):

I don't know the key to success, but the key to failure is trying to please everybody.

Here is my code so far. I am having an extreme mental block and cannot figure out a way to get the program to read the number of times a word occurs.

#include <iostream>
#include <fstream>
#include <iomanip>

const int BUFFER_LENGTH = 256;
const int NUMBER_OF_STRINGS = 100;

int numberOfElements = 0;
char buffer[NUMBER_OF_STRINGS][BUFFER_LENGTH];
char * words = buffer[0];
int frequency[NUMBER_OF_STRINGS];

int StringLength(char * buffer);
int StringCompare(char * firstString, char * secondString);

int main(){

int isFound = 1;
int count = 1;

std::ifstream input("BillCosby.txt");

if(input.is_open())
{
    //Priming read
    input >> buffer[numberOfElements];
    frequency[numberOfElements] = 1;

while(!input.eof())
    {
    numberOfElements++;
    input >> buffer[numberOfElements];

    for(int i = 0; i < numberOfElements; i++){
        isFound = StringCompare(buffer[numberOfElements], buffer[i]);
            if(isFound == 0)
                ++count;
    }

    frequency[numberOfElements] = count;


    //frequency[numberOfElements] = 1;

    count = 1;
    isFound = 1;
    }
    numberOfElements++;
}
else
    std::cout << "File is not open. " << std::endl;

std::cout << "\n\nWord Frequency Analysis " << std::endl;
std::cout << "\n" << std::endl;

std::cout << "Word " << std::setw(25) << "Frequency\n" << std::endl;

for(int i = 0; i < numberOfElements; i++){
    int length = StringLength(buffer[i]);
    std::cout << buffer[i] << std::setw(25 - length) << frequency[i] << 

 std::endl;
}



return 0;
}

int StringLength(char * buffer){
char * characterPointer = buffer;

while(*characterPointer != '\0'){
    characterPointer++;
}

return characterPointer - buffer;
}

int StringCompare(char * firstString, char * secondString)
   {
    while ((*firstString == *secondString || (*firstString == *secondString - 32) ||    

(*firstString - 32 == *secondString)) && (*firstString != '\0'))
{
    firstString++;
    secondString++;
}

if (*firstString > *secondString)
    return 1;

else if (*firstString < *secondString)
    return -1;

return 0;
}
MrPickle5
  • 522
  • 4
  • 9
  • 31
  • 1
    Have you done any research about this, it is a very common question. SO doesn't really like doing peoples homework for them. – Fantastic Mr Fox Jan 21 '13 at 04:19
  • You may have well marked this question as `C`. You are not using any real C++ features. When you get it working ask for a review here: http://codereview.stackexchange.com/questions (don't ask while it is broken you will just be sent back here to get it fixed first). – Martin York Jan 21 '13 at 04:34

5 Answers5

4

Your program is quite confusing to read. But this part stuck out to me:

frequency[numberOfElements] = 1;

(in the while loop). You realize that you are always setting the frequency to 1 no matter how many times the word appears right? Maybe you meant to increment the value and not set it to 1?

Andrew
  • 1,581
  • 3
  • 18
  • 31
3

One approach is to tokenize (split the lines into words), and then use c++ map container. The map would have the word as a key, and word count for value.

For each token, add it into the map, and increment the wordcount. A map key is unique, hence you wouldn't have duplicates.

You can use stringstream for your tokenizer, and you can find the map container reference (incl examples) here.

And don't worry, a good programmer deals with mental blocks on a daily basis -- so get used to it :)

gerrytan
  • 40,313
  • 9
  • 84
  • 99
0

Flow of solution should be something like this: - initialize storage (you know you have a pretty small file apparently?) - set initial count to zero (not one) - read words into array. When you get a new word, see if you already have it; if so, add one to the count at that location; if not, add it to the list of words ("hey - a new word!") and set its count to 1 - loop over all words in the file

Be careful with white space - make sure you are matching only non white space characters. Right now you have "key" twice. I suspect that is a mistake?

Good luck.

Floris
  • 45,857
  • 6
  • 70
  • 122
  • 1
    I disagree. With the right data structure, setting the counts to 0 will happen automatically. "read words into array" is just a bad idea -- an array really is *not* a good choice of data structure for this task. – Jerry Coffin Jan 21 '13 at 04:55
0

Here's a code example that I tested with codepad.org:

#include <iostream>
#include <map>
#include <string>
#include <sstream>

using namespace std;

int main()
{
string s = "I don't know the key to success, but the key to failure is trying to please everybody.";
string word;
map<string,int> freq;

for ( std::string::iterator it=s.begin(); it!=s.end(); ++it)
{
    if(*it == ' ')
    {
         if(freq.find(word) == freq.end()) //First time the word is seen
         {
             freq[word] = 1;
         }
         else //The word has been seen before
         {
             freq[word]++;
         }
         word = "";
    }
    else
    {
         word.push_back(*it);
    }
}

for (std::map<string,int>::iterator it=freq.begin(); it!=freq.end(); ++it)
    std::cout << it->first << " => " << it->second << '\n';

}

It stops when it finds a space so grammatical symbols will mess things up but you get the point.

Output:

I => 1
but => 1
don't => 1
failure => 1
is => 1
key => 2
know => 1
please => 1
success, => 1 //Note this isn't perfect because of the comma. A quick change can fix this though, I'll let //you figure that out on your own.
the => 2
to => 3
trying => 1

Adam27X
  • 889
  • 1
  • 7
  • 16
  • Note: For a map the first time the element is accessed via the `operator[]` it will create the element. Where the `value` part of the element is zero-initialized. This means that int will be initialized to zero. So you do not need to special case the first time. Just increment the `value` each time you isolate a key. – Martin York Jan 21 '13 at 04:41
  • You should try posting this here codereview.stackexchange.com/questions – Martin York Jan 21 '13 at 04:42
  • @LokiAstari Thanks for the heads up. I also realized that I could probably use `isalpha()` to detect words without having to worry about grammatical symbols. I wrote this up quickly mostly to show the algorithm with some working code. – Adam27X Jan 21 '13 at 04:57
  • See here http://stackoverflow.com/a/6154217/14065 to make the stream treat puctuation like a space. This allows you to read tokens with `operator>>` – Martin York Jan 21 '13 at 05:07
0

I'm a bit hesitant to post a direct answer to something that looks a lot like homework, but I'm pretty sure if somebody turns this in as homework, any halfway decent teacher/professor is going to demand some pretty serious explanation, so if you do so, you'd better study it carefully and be ready for some serious questions about what all the parts are and how they work.

#include <map>
#include <iostream>
#include <iterator>
#include <algorithm>
#include <string> 
#include <fstream>
#include <iomanip>
#include <locale>
#include <vector>

struct alpha_only: std::ctype<char> {
    alpha_only() : std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);
        for (int i=0; i<std::ctype<char>::table_size; i++)
            if (isalpha(i)) rc[i] = std::ctype_base::alpha;
        return &rc[0];
    }
};

typedef std::pair<std::string, unsigned> count;

namespace std { 
    std::ostream &operator<<(std::ostream &os, ::count const &c) { 
        return os << std::left << std::setw(25) << c.first 
                  << std::setw(10) << c.second;
    }
}

int main() { 
    std::ifstream input("billcosby.txt");
    input.imbue(std::locale(std::locale(), new alpha_only()));

    std::map<std::string, unsigned> words;

    std::for_each(std::istream_iterator<std::string>(input),
                    std::istream_iterator<std::string>(),
                    [&words](std::string const &w) { ++words[w]; });
    std::copy(words.begin(), words.end(),
              std::ostream_iterator<count>(std::cout, "\n"));
    return 0;
}
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111