Hurray! So, the most important reason your code is failing is that your code does not observe the following inviolable advice:
Always sanitize your inputs
What this means is that if you let the user input anything then he/she/it can break your code. This is a major, common source of problems in all areas of computer science. It is so well known that a NASA engineer has given us the tale of Little Bobby Tables:
Okay, so, all that stuff is about SQL injection, but the point is, validate your input before blithely using it. There are many, many examples of C programs that fail because they do not carefully manage input. One of the most recent and widely known is the Heartbleed Bug.
For more fun side reading, here is a superlatively-titled list of “The 10 Worst Programming Mistakes In History” @makeuseof.com — a good number of which were caused by failure to process bad input!
Academia, methinks, often fails students by not having an entire course on just input processing. Instead we tend to pretend that the issue will be later understood and handled — code in academia, science, online competition forums, etc, often assumes valid input!
Where your code went wrong
Using gets()
is dangerous because it does not stop reading and storing input as long as the user is supplying it. It has created so many software vulnerabilities that the C Standard has (at long last) officially removed it from C. SO actually has an excellent post on it: Why is the gets function so dangerous that it should not be used?
But it does remove the Enter key from the end of the user’s input!
fgets()
, in contrast, stops reading input at some point! However, it also lets you know whether you actually got an entire line of of text by not removing that Enter key.
Hence, assuming the user types: b
a
n
a
n
a
Enter
gets()
returns the string "banana"
fgets()
returns the string "banana\n"
That newline character '\n'
(what you get when the user presses the Enter key) messes up your code because your code only accepts (or works correctly given) minuscule alphabet letters!
The Fix
The fix is to reject anything that your algorithm does not like. The easiest way to recognize “good” input is to have a list of it:
// Here is a complete list of VALID INPUTS that we can histogram
//
const char letters[] = "abcdefghijklmnopqrstuvwxyz";
Now we want to create a mapping from each letter in letters[]
to an array of integers (its name doesn’t matter, but we’re calling it count[]
). Let’s wrap that up in a little function:
// Here is our mapping of letters[] ←→ integers[]
// • supply a valid input → get an integer unique to that specific input
// • supply an invalid input → get an integer shared with ALL invalid input
//
int * histogram(char c) {
static int fooey; // number of invalid inputs
static int count[sizeof(letters)] = {0}; // numbers of each valid input 'a'..'z'
const char * p = strchr(letters, c); // find the valid input, else NULL
if (p) {
int index = p - letters; // 'a'=0, 'b'=1, ... (same order as in letters[])
return &count[index]; // VALID INPUT → the corresponding integer in count[]
}
else return &fooey; // INVALID INPUT → returns a dummy integer
}
For the more astute among you, this is rather verbose: we can totally get rid of those fooey
and index
variables.
“Okay, okay, that’s some pretty fancy stuff there, mister. I’m a bloomin’ beginner. What about me, huh?”
Easy. Just check that your character is in range:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'z')) return &count[c - 'a'];
return &fooey;
}
“But EBCDIC...!”
Fine. The following will work with both EBCDIC and ASCII:
int * histogram(char c) {
static int fooey = 0;
static int count[26] = {0};
if (('a' <= c) && (c <= 'i')) return &count[ 0 + c - 'a'];
if (('j' <= c) && (c <= 'r')) return &count[ 9 + c - 'j'];
if (('s' <= c) && (c <= 'z')) return &count[18 + c - 's'];
return &fooey;
}
You will honestly never have to worry about any other character encoding for the Latin minuscules 'a'..'z'.
Prove me wrong.
Back to main()
Before we forget, stick the required magic at the top of your program:
#include <stdio.h>
#include <string.h>
Now we can put our fancy-pants histogram mapping to use, without the possibility of undefined behavior due to bad input.
int main() {
// Ask for and get user input
char s[1005];
printf("s? ");
fgets(s, 1005, stdin);
// Histogram the input
for (int i = 0; i < strlen(s); i++) {
*histogram(s[i]) += 1;
}
// Print out the histogram, not printing zeros
for (int i = 0; i < strlen(letters); i++) {
if (*histogram(letters[i])) {
printf("%c %d\n", letters[i], *histogram(letters[i]));
}
}
return 0;
}
We make sure to read and store no more than 1004 characters (plus the terminating nul), and we prevent unwanted input from indexing outside of our histogram’s count[]
array! Win-win!
s? a - ba na na !
a 4
b 1
n 2
But wait, there’s more!
We can totally reuse our histogram. Check out this little function:
// Reset the histogram to all zeros
//
void clear_histogram(void) {
for (const char * p = letters; *p; p++)
*histogram(*p) = 0;
}
All this stuff is not obvious. User input is hard. But you will find that it doesn’t have to be impossibly difficult genius-level stuff. It should be entertaining!
Other ways you could handle input is to transform things into acceptable values. For example you can use tolower()
to convert any majuscule letters to your histogram’s input set.
s? ba na NA!
a 3
b 1
n 2
But I digress again...
Hang in there!