You have at least three misunderstandings:
- "char variable stores a white-space"
rewind(stdin)
clears the buffer
_Placeholder
tells you something interesting about how scanf
handles whitespace
But, I'm sorry, none of these are true.
Let's review how scanf
actually handles whitespace. We start with two important pieces of background information:
- The newline character,
\n
, is in most respects an ordinary whitespace character. It occupies space in the input buffer just like any other character. It arrives in the input buffer when you press the Enter key.
- When it's done parsing a
%
-directive, scanf
always leaves unparsed input on the input stream.
Suppose you write
int a, b;
scanf("%d%d", &a, &b);
Suppose you run that code and type, as input
12 34
and then hit the Enter key. What happens?
First, the input stream (stdin
) now contains six characters:
"12 34\n"
scanf
first processes the first of the two %d
directives you gave it. It scans the characters 1
and 2
, converting them to the integer 12 and storing it in the variable a
. It stops reading at the first non-digit character it sees, which is the space character between 2
and 3
. The input stream is now
" 34\n"
Notice that the space character is still on the input stream.
scanf
next processes the second %d
directive. It doesn't immediately find a digit character, because the space character is still there. But that's okay, because like most (but not quite all) scanf
format directives, %d
has a secret extra power: it automatically skips whitespace characters before reading and converting an integer. So the second %d
reads and discards the space character, then reads the characters 3
and 4
and converts them to the integer 34, which it stores in the variable b
.
Now scanf
is done. The input stream is left containing just the newline:
"\n"
Next, let's look at a slightly different — although, as we'll see, actually very similar — example. Suppose you write
int x, y;
scanf("%d", &x);
scanf("%d", &y);
Suppose you run that code and type, as input
56
78
(where that's on two lines, meaning that you hit Enter twice).
What happens now?
In this case, the input stream will end up containing these six characters:
"56\n78\n"
The first scanf
call has a %d
directive to process. It scans the characters 5
and 6
, converting them to the integer 56 and storing it in the variable x
. It stops reading at the first non-digit character it sees, which is the newline after the 6
. The input stream is now
"\n78\n"
Notice that the newline character (both newline characters) are still on the input stream.
Now the second scanf
call runs. It, too, has a %d
directive to process. The first character on the input stream is not a digit: it's a newline. But that's okay, because %d
knows how to skip whitespace. So it reads and discards the newline character, then reads the characters 7
and 8
and converts them to the integer 78, which it stores in the variable y
.
Now the second scanf
is done. The input stream is left containing just the newline:
"\n"
This may all have made sense, may have seemed unsurprising, may have left you feeling, "Okay, so what's the big deal?" The big deal is this: In both examples, the input was left containing that one, last newline character.
Suppose, later in your program, you have some other input to read. We now come to a hugely significant decision point:
If the next input call is another call to scanf
, and if it involves one of the (many) format specifiers that has the secret extra power of also skipping whitespace, that format specifier will skip the newline, then do its job of scanning and converting whatever input comes after the newline, and the program will work as you expect.
But if the next input call is not a call to scanf
, or if it's a call to scanf
that involves one of the few input specifiers that does not have the secret extra power, the newline will not be "skipped", instead it will be read as actual input. If the next input call is getchar
, it will read and return the newline character. If the next input call is fgets
, it will read and return a blank line. If the next input call is scanf
with the %c
directive, it will read and return the newline. If the next input call is scanf
with the %[^\n]
directive, it will read an empty line. (Actually %[^\n]
will read nothing in this case, because it leaves the \n
still on the input.)
It's in the second case that the "extra" whitespace causes a problem. It's in the second case that you may find yourself wanting to explicitly "flush" or discard the extra whitespace.
But it turns out that the problem of flushing or discarding the extra whitespace left behind by scanf
is a remarkably stubborn one. You can't portably do it by calling fflush
. You can't portably do it by calling rewind
. If you care about correct, portable code, you basically have three choices:
- Write your own code to explicitly read and discard "extra" characters (typically, up to and including the next newline).
- Don't try to intermix
scanf
and other calls. Don't call scanf
and then, later, try to call getchar
or fgets
. If you call scanf
and then, later, call scanf
with one of the directives (such as "%c"
) that lacks the "secret extra power", insert an extra space before the format specifier to cause whitespace to be skipped. (That is, use " %c"
instead of "%c"
.)
- Don't use
scanf
at all — do all your input in terms of fgets
or getchar
.
See also What can I use for input conversion instead of scanf?
Addendum: scanf
's handling of whitespace can often seem puzzling. If the above explanation isn't sufficient, it may help to look at some actual C code detailing how scanf
works inside. (The code I'm going to show obviously isn't the exact code that's behind your system's implementation, but it will be similar.)
When it comes time for scanf
to process a %d
directive, you might imagine it will do something like this. (Be forewarned: this first piece of code I'm going to show you is incomplete and wrong. It's going to take me three tries to get it right.)
c = getchar();
if(isdigit(c)) {
int intval;
intval = c - '0';
while(isdigit(c = getchar())) {
intval = 10 * intval + (c - '0');
}
*next_pointer_arg = intval;
n_vals_converted++;
} else {
/* saw no digit; processing has failed */
return n_vals_converted;
}
Let's make sure we understand everything that's going on here. We've been told to process a %d
directive. We read one character from the input by calling getchar()
. If that character is a digit, it's the first of possibly several digits making up an integer. We read characters and, as long as they're digits, we add them to the integer value, intval
, we're collecting. The conversion involves subtracting the constant '0'
, to convert an ASCII character code to a digit value, and successive multiplication by 10. Once we see a character that's not a digit, we're done. We store the converted value into the pointer handed to us by our caller (here schematically but approximately represented by the pointer value next_pointer_arg
), and we add one to a variable n_vals_converted
keeping count of how many values we've successfully scanned and converted, which will eventually be scanf
's return value.
If, on the other hand, we don't even see one digit character, we've failed: we return immediately, and our return value is the number of values we've successfully scanned and converted so far (which may well be 0).
But there is actually a subtle bug here. Suppose the input stream contains
"123x"
This code will successfully scan and convert the digits 1
, 2
, and 3
to the integer 123, and store this value into *next_pointer_arg
. But, it will have read the character x
, and after the call to isdigit
in the loop while(isdigit(c = getchar()))
fails, the x
character will have effectively been discarded: it is no longer on the input stream.
The specification for scanf
says that it is not supposed to do this. The specification for scanf
says that unparsed characters are supposed to be left on the input stream. If the user had actually passed the format specifier "%dx"
, that would mean that, after reading and parsing an integer, a literal x
is expected in the input stream, and scanf
is going to have to explicitly read and match that character. So it can't accidentally read and discard the x
in the process of parsing a %d
directive.
So we need to modify our hypothetical %d
code slightly. Whenever we read a character that turns out not to be an integer, we have to literally put it back on the input stream, for somebody else to maybe read later. There's actually a function in <stdio.h>
to do this, sort of the opposite of getc
, called ungetc
. Here is a modified version of the code:
c = getchar();
if(isdigit(c)) {
int intval;
intval = c - '0';
while(isdigit(c = getchar())) {
intval = 10 * intval + (c - '0');
}
ungetc(c, stdin); /* push non-digit character back onto input stream */
*next_pointer_arg = intval;
n_vals_converted++;
} else {
/* saw no digit; processing has failed */
ungetc(c, stdin);
return n_vals_converted;
}
You will notice that I have added two calls to ungetc
, in both places in the code where, after calling getchar
and then isdigit
, the code has just discovered that it has read a character that is not a digit.
It might seem strange to read a character and then change your mind, meaning that you have to "unread" it. It might make more sense to peek at at an upcoming character (to determine whether or not it's a digit) without reading it. Or, having read a character and discovered that it's not a digit, if the next piece of code that's going to process that character is right here in scanf
, it might make sense to just keep it in the local variable c
, rather than calling ungetc
to push it back on the input stream, and then later calling getchar
to fetch it from the input stream a second time. But, having called out these other two possibilities, I'm just going to say that, for now, I'm going to plough ahead with the example that uses ungetc
.
So far I've shown the code that you might have imagined lay behind scanf
's processing of %d
. But the code I've shown so far is still significantly incomplete, because it does not show the "secret extra power". It starts looking for digit characters right away; it doesn't do any skipping of leading whitespace.
Here, then, is my third and final sample fragment of %d
-processing code:
/* skip leading whitespace */
while(isspace(c = getchar())) {
/* discard */
}
if(isdigit(c)) {
int intval;
intval = c - '0';
while(isdigit(c = getchar())) {
intval = 10 * intval + (c - '0');
}
ungetc(c, stdin); /* push non-digit character back onto input stream */
*next_pointer_arg = intval;
n_vals_converted++;
} else {
/* saw no digit; processing has failed */
ungetc(c, stdin);
return n_vals_converted;
}
That initial loop reads and discards characters as long as they're whitespace. Its form is very similar to the later loop that reads and processes characters as long as they're digits. The initial loop will read one more character than it seems like it should: when the isspace
call fails, that means that it has just read a non whitespace character. But that's okay, because we were just about to read a character to see if it was the first digit.
[Footnotes: This code is still far from perfect. One pretty significant problem is that it doesn't have any checks for an EOF coming along in the middle of its parsing. Another problem is that it doesn't look for -
or +
before the digits, so it won't handle negative numbers. Yet another, more obscure problem is that, ironically, obvious-looking calls like isdigit(c)
are not always correct — strictly speaking they need to be somewhat cumbersomely rendered as isdigit((unsigned char)c)
.]
If you're still with me, my point in all this is to illustrate these two points in a concrete way:
The reason %d
is able to automatically skip leading whitespace is because (a) the specification says it's supposed to and (b) it has explicit code to do so, as my third example illustrates.
The reason scanf
always leaves unprocessed input (that is, input that comes after the input it does read and process) on the input stream is because (a) again, the specification says it's supposed to and (b) its code is typically sprinkled with explicit calls to ungetc
, or the equivalent, to make sure that every unprocessed character remains on the input, as my second example illustrates.