0

I am trying to create a program that reads from a file and counts the occurrence of each alphabetic character in the file. Below is what I have so far, however the counts returned (stored in counters array) are higher than expected.

  void count_letters(const char *filename, int counters[26]) {
    FILE* in_file = fopen(filename, "r");
    const char ALPHABET[] = "abcdefghijklmnopqrstuvwxyz";
    if(in_file == NULL){
        printf("Error(count_letters): Could not open file %s\n",filename);
        return;
    }
    char line[200];
    while(fgets(line, sizeof(line),in_file) != NULL){ //keep reading lines until there's nothing left to read
        for(int pos = 0; pos < sizeof(line); pos++){//iterate through each character in line...
            if(isalpha(line[pos])){//skip checking and increment position if current char is not alphabetical
                for(int i = 0; i < 26; i++){//...for each character in the alphabet
                    if(tolower(line[pos]) == tolower(ALPHABET[i]))//upper case and lower case are counted as same
                        counters[i]++;    // increment the current element in counters for each match in the line
                }
            }
        }
    }
    fclose(in_file);
    return;
}
  • 1
    `pos < sizeof(line)` is a wrong test. You are testing against all chars in `line` (200) and not what was actually read. Change to `pos < strlen(line)`. – Jean-Baptiste Yunès Oct 26 '21 at 16:06
  • In the future, provide a [mre] when asking questions about debugging a program. That means a complete program that others can compile and run without any change—include all the `#include` statements it needs and a `main` routine. It also includes sample input, the observed output, and the output desired instead. – Eric Postpischil Oct 26 '21 at 16:08
  • will do ^ thank you for the advice, I am new to stack :) –  Oct 26 '21 at 16:17
  • no need for `return` at the end of a `void` function, it does that automatically. – yano Oct 26 '21 at 16:23
  • @Dan, it is not customary or appropriate here to modify your question's title to mark it "SOLVED". Since you already rolled back the rollback of that edit once, I will be flagging this question instead of rolling back again. – John Bollinger Oct 26 '21 at 17:04
  • Or not, since you seem now to have reverted that yourself. – John Bollinger Oct 26 '21 at 17:05
  • @John Like clearly said above, I'm new to the site, didn't know it was customary. Now I know. –  Oct 26 '21 at 17:16
  • And that's why I told you. However, you might have been clued in in the first place by the fact that someone (else) actually rolled back the edit in which you first added "SOLVED" to your title. – John Bollinger Oct 26 '21 at 17:18

4 Answers4

0

In for(int pos = 0; pos < sizeof(line); pos++), sizeof(line) evaluates to the size of the entire array line, not the part that was filled in by the most recent fgets call. So, after long lines, the loop repeatedly counts characters left in the array from reading short lines.

Modify the loop to iterate only through the part of line most recently filled in by fgets. You can do that by exiting the loop when a null character is seen.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

My two cents for a somewhat simpler solution (you have got an awful lot of loops ;) ). Reading input line-by-line is preferred in most situations, but since you're simply counting characters here, I don't think this is one of them, and ultimately adds complication. This answer also assumes ASCII character encoding, which as noted in the comments of another answer is not guaranteed by the C standard. You can modify as needed with your char ALPHABET for ultimate portability

#include <stdio.h>
#include <ctype.h>

#define NUM_LETTERS 26

int main(void)
{
    FILE* in_file = fopen("/path/to/my/file.txt", "r");
    if (in_file == NULL) exit(-1);

    unsigned charCounts[NUM_LETTERS] = {0};
    int curChar;
    // rather than reading line-by-line, read one character at a time
    while ((curChar = fgetc(in_file)) != EOF)
    {
        // only proceed if it is a letter
        if (isalpha(curChar))
        {
            // this is bad if not using ASCII, instead you'd need another
            // loop to check your ALPHABET, but increment the count here
            (charCounts[tolower(curChar) - 'a'])++;
        }
    }

    // print out the results
    for (int i=0; i<NUM_LETTERS; i++)
    {
        // 'A'+i also assumes ASCII encoding
        printf("%c: %u\n", 'A'+i, charCounts[i]);
    }
}

Demo using stdin instead of a file.

yano
  • 4,827
  • 2
  • 23
  • 35
-1

You have an error in for(int pos = 0; pos < sizeof(line); .... You assume that all 200 positions in the array are valid characters, but this is true only for a text in which each line has 200 characters. You should count only the characters in the initialized part of the string. The length of it varies from line to line:

for(int pos = 0; pos < strlen(line); ...

Also, you do not need the most inner loop because all alphabetic characters very likely have sequential ASCII codes:

if(isalpha(line[pos]))
    counters[tolower(line[pos]) - 'a']++;

I assume that counters have been previously initialized with 0s. If not, you must initialize this array before counting.

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • The C standard does not guarantee that all C implementations use ASCII. – Eric Postpischil Oct 26 '21 at 16:06
  • @EricPostpischil No, it does not. But ASCII is used in close to 100% of cases, isn't it? – DYZ Oct 26 '21 at 16:10
  • Teaching students to rely on common cases and not on documented specifications leads to bugs. It is bad for them and bad for society. – Eric Postpischil Oct 26 '21 at 16:11
  • 1
    @EricPostpischil Some people [disagree](https://stackoverflow.com/a/29381136/4492932). I think **this** is the case when one may be pragmatic rather than pedantic. – DYZ Oct 26 '21 at 16:15
-1

you do not need to use fgets as character functions will work same fast as file system uses its own buffering.

#define NLETTERS    ('z' - 'a' + 1)

int countLetters(FILE *fi, size_t *counter)
{
    int ch;
    if(fi && counter)
    {
        memset(counter, 0, sizeof(*counter * NLETTERS));
        while((ch = fgetc(fi)) != EOF)
        {
            if(isalpha(ch))
            {
                counter[tolower(ch) - 'a']++;
            }
        }
        return 0;
    }
    return 1;
}
0___________
  • 60,014
  • 4
  • 34
  • 74