0

once again I ask for help. I haven't coded anything for sometime!

Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.

What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.

I already am getting the words via the following code:

vector<string> words;
string currentWord;

while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord); 
}

This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.

Would I have to get the entire line, then process it into words to make this possible?

Trygle
  • 25
  • 1
  • 6
  • 2
    `!inputFile.eof()` is the wrong way to check for errors. As I told you on the previous question you asked about the exact same topic. (For Other Readers: Related: http://stackoverflow.com/questions/3693454/how-to-read-a-file-and-get-words-in-c ) – Billy ONeal Sep 13 '10 at 22:14

3 Answers3

3

Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.

If you need like by line input, use std::getline(std::istream&, std::string&), like this:

std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
    lines.push_back(currentLine);

You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)

EDIT: This is the same thing as in @dash-tom-bang's answer, but modified to be correct with respect to error handing:

vector<string> words;
int currentLine = 1; // or 0, however you wish to count...

string line;
while (getline(inputFile, line))
{
   istringstream inputString(line);
   string word;
   while (inputString >> word)
      words.push_back(pair(word, currentLine));
}
Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • Thanks for the answer, mine can go away. ;) It's amazing how quickly this stuff evaporates from one's brain without use. – dash-tom-bang Sep 13 '10 at 22:50
  • Thanks again. I am very sorry to have changed the coding on the old example. I'll give this a try and post back later with results. – Trygle Sep 14 '10 at 03:15
  • @Trygle: This is not intended to be a drop in chunk of code. We aren't going to write your program for you. However, we will give you pointers in the right direction. – Billy ONeal Sep 14 '10 at 03:19
  • Oh I know. What fun is coding if everyone does it for you? Thanks for the help though. I should close this by now. – Trygle Sep 14 '10 at 03:35
0

You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.

This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.

Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:

  • First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
  • Then read non-whitespace characters, putting them into the string object you'll be returning.
  • If it runs out of stuff to read, read the next block and continue.
  • If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
  • If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)

Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.

An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Mike DeSimone
  • 41,631
  • 10
  • 72
  • 96
  • No, `std::getline(std::istream&, std::string&)` does not require specifying a buffer. And I really don't see how OOP has anything to do with what the OP wants to do. There's no inheritance or polymorphic behavior going on here. – Billy ONeal Sep 13 '10 at 22:22
  • Modularity and data hiding. Putting the ugliness of the buffering in a class so the rest of the code doesn't have to deal with it. There's plenty of good stuff in C++ that doesn't involve `virtual`. – Mike DeSimone Sep 14 '10 at 01:08
0

Short and sweet.

vector< map< string, size_t > > line_word_counts;

string line, word;
while ( getline( cin, line ) ) {
    line_word_counts.push_back();
    map< string, size_t > &word_counts = line_word_counts.back();

    istringstream line_is( line );
    while ( is >> word ) ++ word_counts[ word ];
}

cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
     << " times\n";
Potatoswatter
  • 134,909
  • 25
  • 265
  • 421