0

I have four sets of text files each containing different words.

noun.txt has 7 words Article.txt has 5 words verb.txt has 6 words and Preposition.txt has 5 words

In the code below, inside my second for loop, an array of counts keeps track of how many words i've read in and from what file. so for example. count[0] should be 5 worlds which it is, but count[1] has 8 words but should be 7. I went back to check the text file and i didn't make a mistake, it has 7 words. Is this a problem with how ifstream is behaving ?

I've also been told eof() is not good practice. What's best practice in industry in terms of reading in data accurately ? In other words is there something better i can use besides !infile.eof() ?

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <cctype>
#include <array> // std::array

using namespace std;

const int MAX_WORDS = 100;

class Cwords{
    public:
        std::array<string,4> partsOfSpeech;
};

int main()
{
    Cwords elements[MAX_WORDS];

   int count[4] = {0,0,0,0};

   ifstream infile;

    string file[4] = {"Article.txt",
                      "Noun.txt",
                      "Preposition.txt",
                      "verb.txt"};

    for(int i = 0; i < 4; i++){
        infile.open(file[i]);
        if(!infile.is_open()){
            cout << "ERROR: Unable to open file!\n";
            system("PAUSE");
            exit(1);
        }

        for(int j = 0;!infile.eof();j++){
            infile >> elements[j].partsOfSpeech[i];
            count[i]++;
        }

        infile.close();
    }

    ofstream outfile;
    outfile.open("paper.txt");

    if(!outfile.is_open()){
        cout << "ERROR: Unable to open or create file.\n";
        system("PAUSE");
        exit(1);
    }



    outfile.close();
    system("PAUSE");
    return 0;
}
Amber Roxanna
  • 1,665
  • 4
  • 24
  • 30
  • possible duplicate of [Passing istream intro a function](http://stackoverflow.com/questions/14548962/passing-istream-intro-a-function) and [how do i read data from textfile and push back to a vector?](http://stackoverflow.com/q/14545120/179910) – Jerry Coffin Aug 12 '13 at 20:34
  • 2
    Don't use `.eof()`. Most of the questions arising here about reading files are misusing `.eof()`. Who is going around telling people to use `.eof()`? Any C++ text book and tutorial will tell you to stop reading when the `>>` operator fails; that is, `while (file >> variable) { ... do something ... }`. – DanielKO Aug 12 '13 at 20:44
  • @DanielKO Ok, I used your suggestion and it works. I converted my for loop to int j = 0; while(infile >> ...) {}. – Amber Roxanna Aug 12 '13 at 21:10
  • @DanielKO oh and to answer your question, a lot of universities seem to promote .eof() – Amber Roxanna Aug 12 '13 at 21:14
  • @AmberRoxanna Makes you want to ask for a refund, doesn't it? – WhozCraig Aug 12 '13 at 21:16

3 Answers3

3

The simple answer to reading data properly is this: always test after reading that the read operation was successful. This test does not involve the use of eof() (any book teaching the use of eof() prior to reading is worthy to be burnt immediately).

The main loop for reading the file should look something like this:

for (int j = 0; infile >> elements[j].partsOfSpeach[i]; ++j){
    ++count[i];
}

BTW, although the language is called "C++" and not "++C", don't use post increment unless you actually do use the result of the expression: in most cases it doesn't matter but sometimes it does matter and then post-increment can be significant slower than pre-increment.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • +1 burnt?? how about "burned" (or perhaps the slang, "torched" =) – WhozCraig Aug 12 '13 at 21:18
  • @WhozCraig: Maybe it's a difference between English and American but it seems [burn](http://www.usingenglish.com/reference/irregular-verbs/burn.html) is an irregular verb, at least, sometimes. At school (in Germany) I learned the irregular form... – Dietmar Kühl Aug 12 '13 at 21:59
  • Yeah, it might be so. American uses "burnt" as well, though rare is the case when "burned" it not preferred. It peaked my curiosity to the point where [a little research](http://grammarist.com/usage/burned-burnt/) was in order. I never really thought about until you mentioned it. The differences between UK and American were interesting. Always learning new stuff. =) – WhozCraig Aug 12 '13 at 23:13
0

Have you checked to make sure there aren't any extra spaces or newline's at the end of your text file? It may be possible that your last extra 'word' is due to trailing characters before the eof is reached.

Lochemage
  • 3,974
  • 11
  • 11
  • Yes, it is picking up a newline characters. I thought white spaces were skipped ? How can i adjust for this ? – Amber Roxanna Aug 12 '13 at 20:27
  • 1
    The problem happens because the use of `.eof()` is wrong. It will tell you reached the end of file **after** you try to read something and there was nothing there. It will **not** predict the future (that is, tell that the next input operation will succeed or not) like PASCAL did. – DanielKO Aug 12 '13 at 20:48
0

Likely you have an empty line at the end of the file, that looks "empty". My recommendation is to use code like the following:

#include <boost/algorithm/string.hpp>
#include <string>

...

    std::string line;
    int cnt = 0;
    while(! infile.eof()) {
        infile >> line;
        boost::algorithm::trim(line);
        if(line.size > 0)
            words[filenr][cnt++] = line;
    }

Note, that I strongly recommend to have an "outer" object, that is indexed by the type of list (like 0 for Article.txt and 1 for Noun.txt), and the "inner" object be a vector, that takes the words. Your implementation is the other way round, which is sub-optimal, because you have to carry around empty slots in your partsOfSpeech vector in your implementation. Also note, that in your example, setting a hard upper limit of "100" for the number of words for each file is very dangerous - it can cause buffer overrun! Better use std::vector for the actual word lists, as vectors auto-expand easily.

Kai Petzke
  • 2,150
  • 21
  • 29
  • 1
    I'm tempted to downvote: You **always** (as in: always) need to test whether the read was successful **after** attempting to read. – Dietmar Kühl Aug 12 '13 at 21:10
  • @DietmarKühl not being familiar with the internals of std extraction operators (though I use the hell out of them daily), I defer to your knowledge. If `infile >> line;` *fails*, is the content of `line` even *defined* ? If not, this can't work with-definition, and the likely side effect would be duplicity in the container for the last successful line that *was* read. I concur with you. – WhozCraig Aug 12 '13 at 23:31
  • @WhozCraig: the normal behavior of the input operators is to leave the value of the argument unchanged if the extraction fails. Although the definition in the standard isn't explicit about it, that is the case for the string input operator at least when the flag `std::ios_base::skipws` is set. – Dietmar Kühl Aug 12 '13 at 23:47