Count Different Character Types In String

Question

I wrote a program that counts and prints the number of occurrences of elements in a string but it throws a garbage value when i use fgets() but for gets() it's not so.

Here is my code:

#include<stdio.h>
#include<string.h>
#include<ctype.h>
#include<stdlib.h>
int main() {
    char c[1005];
    fgets(c, 1005, stdin);
    int cnt[26] = {0};
    for (int i = 0; i < strlen(c); i++) {
        cnt[c[i] - 'a']++;
    }
    for (int i = 0; i < strlen(c); i++) {
        if(cnt[c[i]-'a'] != 0) {
            printf("%c %d\n", c[i], cnt[c[i] - 'a']);
            cnt[c[i] - 'a'] = 0;
        }
    }
    return 0;
}

This is what I get when I use fgets():

baaaabca
b 2
a 5
c 1

 32767

--------------------------------
Process exited after 8.61 seconds with return value 0
Press any key to continue . . . _

I fixed it by using gets and got the correct result but i still don't understand why fgets() gives wrong result

Never ***ever*** use `gets`! It's so [dangerous](https://stackoverflow.com/questions/1694036/why-is-the-gets-function-so-dangerous-that-it-should-not-be-used) it has even been removed from the C language (was obsoleted in C99, removed in C11). Using `gets` instead of `fgets` is never a solution to anything. — Some programmer dude, Jan 02 '23 at 06:56
Newlines. Your code does not protect against values outside the range of `'a'`..`'z'`, like the `'\n'` that you get when you press the Enter key. — Dúthomhas, Jan 02 '23 at 06:56
As for your problem, please [read more about `fgets`](https://en.cppreference.com/w/c/io/fgets). — Some programmer dude, Jan 02 '23 at 06:56
lol, I miss stuff all the time. You had me rechecking just to be sure, but today I am on point. (We’ll see how long that lasts, I guess...) `;-}` — Dúthomhas, Jan 02 '23 at 07:14
This is what I missed. I expected more output for a longer string of abcabc.... `cnt[c[i] - 'a'] = 0;`. — Yunnosch, Jan 02 '23 at 07:16
I can fix this problem because last element in array if I input by fgets() is "NULL" so strlen()-1; — Nghi, Jan 02 '23 at 07:16
@Nghi Incorrect, the null-terminator is not counted by `strlen`. The problem is that `gets` reads but throws away the newline, while `fgets` puts it in the buffer. `'\n' - 'a'` will not give you the value you expect. That will lead to *undefined behavior*. — Some programmer dude, Jan 02 '23 at 07:21
For a solution see [Removing trailing newline character from fgets() input](https://stackoverflow.com/questions/2693776/removing-trailing-newline-character-from-fgets-input) — Some programmer dude, Jan 02 '23 at 07:22
While removing the newline (or using `strlen(c)-1`) would work, neither are the actual solution to this problem. Fix the potential invalid array indexing. (If this question is reopened, I can give an answer that addresses that...) — Dúthomhas, Jan 02 '23 at 07:23
You also need to think about what happens if the user input an upper-case letter. Or anything else that isn't a letter. Please learn about [`isalpha`](https://en.cppreference.com/w/c/string/byte/isalpha) and [`tolower`](https://en.cppreference.com/w/c/string/byte/tolower). — Some programmer dude, Jan 02 '23 at 07:41
Also please note that your code is specifically coded to handle ASCII encoding. While it's unlikely that you ever come across any other encoding, there are still old encodings in use today where letters aren't encoded in a contiguous range like ASCII. See for example [EBCDIC](https://en.wikipedia.org/wiki/EBCDIC). Even if you never will use anything other than ASCII, you need to be aware of this. — Some programmer dude, Jan 02 '23 at 07:45

score 1 · Answer 1 · answered Jan 06 '23 at 04:34

Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:

Always sanitize your inputs

What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:

Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.

^{For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” @makeuseof.com — a good number of which were caused by failure to process bad input!}

Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!

Where your code went wrong

Using gets() is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?

But it does remove the Enter key from the end of the user’s input!

fgets(), in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.

Hence, assuming the user types: b a n a n a Enter

gets() returns the string "banana"
fgets() returns the string "banana\n"

That newline character '\n' (what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!

The Fix

The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:

// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";

Now we want to create a mapping from each letter in letters[] to an array of integers (its name doesn’t matter, but we’re calling it count[]). Let’s wrap that up in a little function:

// Here is our mapping of letters[] ←→ integers[]
//   • supply a valid input    → get an integer unique to that specific input
//   • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
    static int fooey;                         // number of invalid inputs
    static int count[sizeof(letters)] = {0};  // numbers of each valid input 'a'..'z'

    const char * p = strchr(letters, c);      // find the valid input, else NULL
    if (p) {
      int index = p - letters;                // 'a'=0, 'b'=1, ... (same order as in letters[])
      return &count[index];                   // VALID INPUT → the corresponding integer in count[]
    }
    else return &fooey;                       // INVALID INPUT → returns a dummy integer
}

^{For the more astute among you, this is rather verbose: we can totally get rid of those fooey and index variables.}

“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”

Easy. Just check that your character is in range:

int * histogram(char c) {
    static int fooey = 0;
    static int count[26] = {0};
    if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
    return &fooey;
}

“But EBCDIC...!”

Fine. The following will work with both EBCDIC and ASCII:

int * histogram(char c) {
    static int fooey = 0;
    static int count[26] = {0};
    if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
    if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
    if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
    return &fooey;
}

You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.
Prove me wrong.

Back to main()

Before we forget, stick the required magic at the top of your program:

#include <stdio.h>
#include <string.h>

Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.

int main() {
    // Ask for and get user input
    char s[1005];
    printf("s? ");
    fgets(s, 1005, stdin);
    
    // Histogram the input
    for (int i = 0;  i < strlen(s);  i++) {
        *histogram(s[i]) += 1;
    }
    
    // Print out the histogram, not printing zeros
    for (int i = 0;  i < strlen(letters);  i++) {
        if (*histogram(letters[i])) {
            printf("%c %d\n", letters[i], *histogram(letters[i]));
        }
    }
    return 0;
}

We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[] array! Win-win!

s?    a  -  ba na na !
a 4
b 1
n 2

But wait, there’s more!

We can totally reuse our histogram. Check out this little function:

// Reset the histogram to all zeros
//
void clear_histogram(void) {
    for (const char * p = letters;  *p;  p++)
      *histogram(*p) = 0;
}

All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!

Other ways you could handle input is to transform things into acceptable values. For example you can use tolower() to convert any majuscule letters to your histogram’s input set.

s? ba na NA!
a 3
b 1
n 2

But I digress again...

Hang in there!