Let's just strip the program down to the calls which do input:
scanf("%d", &size); // Statement 1
while((c = getchar()) != EOF){ // 2
scanf("%d", &p); // 3
scanf("%d", &q); // 4
}
That is definitely not the way to go; we'll get to the correct usage in a bit. For now, let's just analyze what happens. It's important to understand precisely how scanf
works. The %d
format code causes it to first skip over any whitespace characters, and then read characters as long as the characters can be made into a decimal integer. Eventually some character will be read which is not part of a decimal integer; most likely a newline character. Because the format string is now finished, the unused character which has just been read will be reinserted into the stream.
So when the call to getchar
is made, getchar
will read and return the newline character which terminated the integer. Inside the loop, there are then two calls to scanf("%d")
, each of which will behave as indicated above: skip whitespace if any, read a decimal integer, and reinsert the unused character back into the input stream.
Now, let's suppose that you run the program, and enter the number 42
followed by the enter key, and then Ctrl-D to close the input stream.
The 42
will be read by statement 1, and (as mentioned above) the newline will be read by statement 2. So when statement 3 is executed, there is no more data to be read. Because end-of-file is signaled before any digit is read, scanf
will return EOF
. However, the code does not test the return value of scanf
; it goes on to statement 4.
What should happen at this point is that the scanf
in statement 4 should immediately return EOF
without attempting to read more input. That's what the C standard says should happen, and it is what Posix says should happen. Once end-of-file has been signaled on a stream, any input request should immediately return EOF
until the end-of-file indicator is manually cleared. (See below for standards quotes.)
But glibc, for reasons we won't go into just yet, does not conform to the standard. It attempts another read. And so the user must enter another Ctrl-D, which will cause the scanf
at statement 4 to return EOF
. Again, the code does not check the return code, so it continues with the while loop and calls getchar
again at statement 2. Because of the same bug, getchar
does not immediately return EOF
, but instead attempts to read a character from the terminal. So the user must now type a third Ctrl-D to cause getchar
to return EOF
. Finally, the code checks a return code, and the while loop terminates.
So that is the explanation of what is happening. Now, it is easy to see at least one mistake in the code: the return value of scanf
is never checked. Not only does this mean that EOF
is missed, it also means that input errors are ignored. (scanf
would have returned 0 if the input could not be parsed as an integer.) That's serious, because if scanf
cannot succesfully match the format code, the value of the corresponding argument is undefined and must not be used.
In short: Always check return values from *scanf
. (And other I/O library functions.)
But there is a more subtle mistake as well, which makes little difference in this case but could, in general, be serious. The character read by getchar
in statement 2 is simply discarded, regardless of what it was. Normally it will be whitespace, so it doesn't matter that it is discarded, but you don't actually know that because the character is discarded. Maybe it was a comma. Maybe it was a letter. Maybe it matters what it was.
It is bad style to rely on the assumption that whatever character is read by the getchar
at statement 2 is unimportant. If you really need to peek at the next character, you should reinsert it into the input stream, just as scanf
does:
while ((c = getchar()) != EOF) {
ungetc(c, stdin); /* Put c back into the input stream */
...
}
But actually, that test is not what you want at all. As we have already seen, it is extremely unlikely that getchar
will return EOF
at this point. (It's possible, but it's very unlikely). Much more more probable is that getchar
will read a newline character, even though the next scanf
will encounter the end-of-file. So there was absolutely no point peeking at the next character; the correct solution is to check the return code of scanf
, as indicated above.
Putting that together, what you really want here is something more like:
/* No reason to use two scanf calls to read two consecutive numbers */
while ((count = scanf("%d%d", &p, &q)) == 2) {
/* Do something with p and q */
}
if (count != EOF) {
/* Invalid format. Issue an error message, at least */
}
/* Do whatever needs to be done at the end of input. */
Finally, let's examine glibc's behaviour. There is a very long-standing bug report linked to by an answer to the question cited in the OP. If you take the trouble to read through to the most recent post in the bugzilla thread, you'll find a link to a discussion on the glibc developer mailing list.
Let me give the TL;DR version, and save you the trouble of digital archaeology. Since C99, the standard has been clear that EOF is "sticky". §7.21.3/11 states that all input is performed as though successive bytes were read by fgetc
:
...The byte input functions read characters from the stream as if by successive calls to the fgetc
function.
And §7.21.7.1/3 states that fgetc
returns EOF
immediately if the stream's end-of-file indicator is set:
If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc
function returns EOF
. Otherwise, the fgetc
function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc
function
returns EOF
.
So once the end-of-file indicator is set, because either end of file was detected or some read error occurred, subsequent input operations must immediately return EOF
without attempting to read from the stream. Various things can clear the end-of-file indicator, including clearerr
, seek
, and ungetc
; once the end-of-file indicator has been cleared, the next input function call will again attempt to read from the stream.
However, it wasn't always like that. Before C99, the result of reading from a stream which had already returned EOF
was unspecified. And different standard libraries chose to handle it in different ways.
So a decision was made to not change glibc to conform to the (then) new standard, but rather to maintain compatibility with certain other C libraries, notably Solaris. (A comment in the glibc source is quoted in the bug report.)
Although there is a compelling argument (at least, compelling to me) that fixing the bug is not likely to break anything important, there is still a certain reluctance to do anything about it. And so, here we are, ten years later, with a still-open bug report, and a non-conforming implementation.