2

I wrote a program for counting the number of alphanumeric characters in a text file. However, the number it returns is always larger than the number that online character counters return.

For example, the program will calculate the number of alphanumeric characters in this text:

if these people had strange fads and expected obedience on the most extraordinary matters they were at least ready to pay for their eccentricity

to be 162. Running the program again, it'll say there are 164 characters in the text. Running it again, it'll say there are 156 characters. Using this online character counter, it seems that the character count ought to be lower than 144 (the online character counter includes spaces as well).

Here is the code:

#include <iostream>
#include <fstream>
#include <cctype>
using namespace std;

int main() {
    char line[100];
    int charcount = 0;
    ifstream file("pg1661sample.txt");
    while (!file.eof()) {
        file.getline(line, 99);
        for (int i = 0; i < 100; i++) {
            if (isalnum(line[i])) {
                charcount++;
            }
        }
    }

    cout << endl << "Alphanumeric character count: " << charcount;
    cin.get();
    return 0;
}

What am I doing wrong?

cosmicomic
  • 23
  • 1
  • 4
  • 1
    Read this: http://stackoverflow.com/questions/21647/reading-from-text-file-until-eof-repeats-last-line – jrok Aug 20 '12 at 22:30
  • 1
    @jrok, it might also be that he simply counts alnums past the end of the string with the final read (there's an off-by-one in any case.) – eq- Aug 20 '12 at 22:31

3 Answers3

5

Try:

#include <iterator>
#include <algorithm>
#include <iostream>
#include <cctype>
bool isAlphaNum(unsigned char x){return std::isalnum(x);}
int main()
{
    std::cout << "Alphanumeric character count: " <<
    std::count_if(std::istream_iterator<char>(std::cin),
                  std::istream_iterator<char>(),
                  isAlphaNum
                 ) ;
}

Problems with your code:

EOF is not true until you read past the end of file:

 // this is true even if there is nothing left to read.
 // If fails the first time you read after there is nothing left.
 while (!file.eof()) {

 // thus this line may fail
     file.getline(line, 99);

It is better to always do this:

 while(file.getline(line, 99))

The loop is only entered if the getline actually worked.

You are also using a bad version of getline (as lines may be larger than 100 characters).
Try and use the version that works with std::string so it auto expands.

std::string  line;
while(std::getline(file, line))
{
     // stuff
}

Next you assume the line is exactly 100 characters.
What happedn if the line is only 2 characters long?

for (int i = 0; i < 100; i++)

Basically you will scan over the data and it will count letters that were from left over from a previous line (if a previous line was longer than the current) or completely random garbage. If you are still useing file.getline() then you can retrieve the number of characters from a line using file.gcount(). If you use the std::getline() then the variable line will be the exact size of the line read (line.size()).

Martin York
  • 257,169
  • 86
  • 333
  • 562
  • I see! Thanks for the thorough answer! – cosmicomic Aug 20 '12 at 22:54
  • I generally like this answer, however, it isn't portable! The functions from `` can be called only with positive values but your code would create negative values on systems where `char` is signed. To avoid this, you should declare you test as `bool isAlphaNum(unsigned char)`. This declaration guarantees that all `char` values are transformed into appropriate `int` arguments for `std::isalnum()`. From a performance point of view I would also use `std::istreambuf_iterator` rather than `std::istream_iterator` (not the `buf` in the former). – Dietmar Kühl Aug 20 '12 at 23:18
1
while (!file.eof()) {

Don't do this. eof() doesn't return true until after an attempted input has failed, so loops like this run an extra time. Instead, do this:

while (!file.getline(line, 99)) {

The loop will terminate when the input ends.

The other problem is in the loop that counts characters. Ask yourself: how many characters got read into the buffer on each pass through the input loop? And why, then, is the counting loop looking at 100 characters?

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
0

You're assuming that getline() fills line with exactly 100 characters. Check the length of the string read in by getline(), e.g. using strlen():

for (int i = 0; i < strlen(line); i++) {
    if (isalnum(line[i])) {
        charcount++;
    }
}

EDIT: Also, make sure you heed the suggestion from other answers to use getline()'s return value for the loop condition rather than calling eof().

Adam Zalcman
  • 26,643
  • 4
  • 71
  • 92