3

I tried with VS2017 (32 Bit Version) on a 64 bit Windows PC and it seems to me that fscanf() sets the eof flag immediately after successfully reading the last item within a file. This loop terminates immeadiately after fscanf() has read the last item in the file related to stream:

while(!feof(stream))
{
    fscanf(stream,"%s",buffer);
    printf("%s",buffer);
}

I know this is insecure code... I just want to understand the behaviour. Please forgive me ;-)

Here, stream is related to an ordinary text file containing strings like "Hello World!". The last character in that file is not a newline character.

However, fgetc(), having processed the last character, tries to read yet another one in this loop, which leads to c=0xff (EOF):

while (!feof(stream))
{
    c = fgetc(stream);
    printf("%c", c);
}

Is this behaviour of fscanf() and fgetc() standardized, implementation dependent or something else? I am not asking why the loop terminates or why it does not terminate. I am interested in the question if this is standard behaviour.

maya
  • 93
  • 6
  • 1
    it **depends** on whether the `fscanf` *did* read until EOF or not. `fscanf("%c")` is exactly analogous to `fgetc`. – Antti Haapala -- Слава Україні Jun 17 '18 at 10:43
  • 3
    `EOF` is not `0xff`; `EOF` is guaranteed to be negative specifically so it can't be confused with a successful `fgetc` return value. – melpomene Jun 17 '18 at 10:44
  • Sorry, I read the duplicate but I can't find the answer there... Could you please clarify that? My question ist not about "Why does the loop not terminate" or something like that, but about the standard... – maya Jun 17 '18 at 10:48
  • Your question is confused. The code you've shown behaves as expected (see also the linked question). It's not clear what behavior of `scanf` you're talking about. – melpomene Jun 17 '18 at 10:57
  • @maya I think the point is that when you have code that doesn't work, it t's not always meaningful to ask *why* it doesn't work, or why two different variants of the same wrong thing behave slightly, strangely differently. As the linked answer explains, `while(!feof(fp))` is *always* wrong. So it's not too interesting why `fgetc` inside the loop does one thing, and `fscanf` does another. Fix the `while(!feof(fp))` problem, and the problem (the difference) goes away. – Steve Summit Jun 17 '18 at 11:00
  • Yes, it behaves as expected. But when I run an analogous loop with fscanf() the condition is false immediately after fscanf has read the last item in the file. So it seems fscanf sets the eof flag earlier. Okay. Is this standard/reliable? – maya Jun 17 '18 at 11:01
  • 1
    You'll have to show your "analogous loop with `fscanf`". I tried what I thought you meant, and saw the same behavior in either case. – Steve Summit Jun 17 '18 at 11:05
  • I must insist :-) I just want to know about the behaviour of these standard functions, fscanf and fgetc. This discussion about good coding may be very interesting, but it's not my point right now. – maya Jun 17 '18 at 11:06
  • @maya I'd like to help, but you're not letting me. I've tried three programs here, and they all behave the same way for me. The burden is now on you to show more exactly what you mean, preferably by presenting your code using `scanf`. – Steve Summit Jun 17 '18 at 11:08
  • Okay! Did that... – maya Jun 17 '18 at 11:11
  • 1
    @maya Getting there, but `fscanf` with `%s` was one of the three programs I tried, and for me it's printing the last line twice, just like the character-at-a-time versions print the last character twice. Can you describe how you're printing output in your (now) first example, and perhaps also what your input looks like? – Steve Summit Jun 17 '18 at 11:15
  • This is why posting a [mcve] is so important. – melpomene Jun 17 '18 at 11:23
  • I'm printing output via printf() (omitted that in the while loop in the example to make it short, but now added it) and my input are strings of ascii characters, so the input file may be a text file containing something like "Hello World!". – maya Jun 17 '18 at 11:25
  • @maya yes, it is standard behaviour. `fgetc` returns `EOF` when there is nothing to read. But `feof` returns `true` when you *previously* read past the end of the file, i.e. only *after* `fgetc` returned `EOF`, not before. – Weather Vane Jun 17 '18 at 11:25
  • @maya The crucial bit is what the last character in the file is. Is it terminated by a newline (`'\n'`) or not? – melpomene Jun 17 '18 at 11:26
  • meplomene: It is not a newline character. Why is that crucial? Weather Vane: Do you know if the behaviour of fscanf() is also standard? – maya Jun 17 '18 at 11:30
  • @maya are you suggesting the library is written incorrectly? – Weather Vane Jun 17 '18 at 11:38
  • @maya Think about how `fscanf` `%s` must be implemented. – melpomene Jun 17 '18 at 11:41
  • @Weather Vane Hmmm... In priniciple, this is a possibility, which I didn't take into account first. So you are saying VS2017 is doing non-standard things here? – maya Jun 17 '18 at 11:42
  • @meplomene it is a mistery ;-) – maya Jun 17 '18 at 11:43
  • @maya no it's you implying that, by asking if the library complies with the standard. But with VC that is always a possibility, although the reasons why VC is said to be non-compliant aren't usually mistakes in the implementation of individual functions. The fifth comment (it was from Steve) is relevant. – Weather Vane Jun 17 '18 at 11:50
  • 1
    @maya No, it's not. `%s` is specified to read a sequence of consecutive non-whitespace characters, so it has to keep reading until it hits a whitespace char or EOF. In the latter case it'll probably set the eof indicator on the stream. – melpomene Jun 17 '18 at 11:52
  • @melpomene:I tried an additional newline character, in my example and with my VS implementation, it is read in the last iteration and then the loop termintates. So in either case, fscanf() seems to set the eof flag immediately after it has read the last string, being a "proper word" or a newline character. The only difference seems to be that in the case of a newline character, it returns failure. – maya Jun 17 '18 at 12:02
  • @Weather Vane: I was not sure if the standard did really address this issue. I thought that this could also be left open... – maya Jun 17 '18 at 12:03
  • @maya you should not be testing the return value from `scanf` function family for `EOF` anyway. Instead, test that the correct number of items was scanned. This is especially important for `%d` and `%f` types, because if data is found which cannot be converted, the input will stall and `EOF` will never be returned. But when the input is correct the test will catch `EOF` too (because it is not 1, 2, 3 etc). – Weather Vane Jun 17 '18 at 12:16
  • @WeatherVane I will keep that in mind. Just observed the EOF as a return value here. – maya Jun 17 '18 at 12:23
  • Reopened because the OP is not asking why `feof` didn't work as expected (which is what [the proposed duplicate](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) is about), but rather, why `fscanf` sometimes behaves a little differently. – Steve Summit Jun 17 '18 at 14:10
  • The way `%s` works in `fscanf`, it reads the characters until it finds the end of the string - a whitespace character or end of file. According to your description (you said the line does not end in a newline character), in your case `%s` stops reading when it bumps into the end of file (i.e. until it attempts to read *beyond* the end), which sets the EOF condition. This is exactly what happens in the second version of your code (with `getc`). So, both versions behave identically. I don't understand what made you decide that `fscanf` sets the EOF condition "earlier" than `getc`. It doesn't. – AnT stands with Russia Jun 17 '18 at 14:40
  • Why C11? VS 2017 does not feature support for C11, it is still C99 AFAIK. – ddbug Jun 17 '18 at 14:46
  • Oh, you're absolutely right. I'm going to change that in the question. – maya Jun 17 '18 at 16:03

3 Answers3

5

In my experience, when working with <stdio.h> the precise semantics of the "eof" and "error" bits are very, very subtle, so much so that it's not usually worth it (it may not even be possible) to try to understand exactly how they work. (The first question I ever asked on SO was about this, although it involved C++, not C.)

I think you know this, but the first thing to understand is that the intent of feof() is very much not to predict whether the next attempt at input will reach the end of the file. The intent is not even to say that the input stream is "at" the end of the file. The right way to think about feof() (and the related ferror()) is that they're for error recovery, to tell you a bit more about why a previous input call failed.

And that's why writing a loop involving while(!feof(fp)) is always wrong.

But you're asking about precisely when fscanf hits end-of-file and sets the eof bit, versus getc/fgetc. With getc and fgetc, it's easy: they try to read one character, and they either get one or they don't (and if they don't, it's either because they hit end-of-file or encountered an i/o error).

But with fscanf it's trickier, because depending on the input specifier being parsed, characters are accepted only as long as they're appropriate for the input specifier. The %s specifier, for example, stops not only if it hits end-of-file or gets an error, but also when it hits a whitespace character. (And that's why people were asking in the comments whether your input file ended with a newline or not.)

I've experimented with the program

#include <stdio.h>

int main()
{
    char buffer[100];
    FILE *stream = stdin;

    while(!feof(stream)) {
        fscanf(stream,"%s",buffer);
        printf("%s\n",buffer);
    }
}

which is pretty close to what you posted. (I added a \n in the printf so that the output was easier to see, and better matched the input.) I then ran the program on the input

This
is
a
test.

and, specifically, where all four of those lines ended in a newline. And the output was, not surprisingly,

This
is
a
test.
test.

The last line is repeated because that's what (usually) happens when you write while(!feof(stream)).

But then I tried it on the input

This\n
is\n
a\n
test.

where the last line did not have a newline. This time, the output was

This
is
a
test.

This time, the last line was not repeated. (The output was still not identical to the input, because the output contained four newlines while the input contained three.)

I think the difference between these two cases is that in the first case, when the input contains a newline, fscanf reads the last line, reads the last \n, notices that it's whitespace, and returns, but it has not hit EOF and so does not set the EOF bit. In the second case, without the trailing newline, fscanf hits end-of-file while reading the last line, and so does set the eof bit, so feof() in the while() condition is satisfied, and the code does not make an extra trip through the loop, and the last line is not repeated.

We can see a bit more clearly what's going on if we look at fscanf's return value. I modified the loop like this:

while(!feof(stream)) {
    int r = fscanf(stream,"%s",buffer);
    printf("fscanf returned %2d: %5s (eof: %d)\n", r, buffer, feof(stream));
}

Now, when I run it on a file that ends with a newline, the output is:

fscanf returned  1:  This (eof: 0)
fscanf returned  1:    is (eof: 0)
fscanf returned  1:     a (eof: 0)
fscanf returned  1: test. (eof: 0)
fscanf returned -1: test. (eof: 1)

We can clearly see that after the fourth call, feof(stream) is not true yet, meaning that we'll make that last, extra, unnecessary, fifth trip through the loop. But we can see that during the fifth trip, fscanf returns -1, indicating (a) that it did not read a string as expected and (b) it reached EOF.

If I run it on input not containing the trailing newline, on the other hand, the output is like this:

fscanf returned  1:  This (eof: 0)
fscanf returned  1:    is (eof: 0)
fscanf returned  1:     a (eof: 0)
fscanf returned  1: test. (eof: 1)

Now, feof is true immediately after the fourth call to fscanf, and the extra trip is not made.

Bottom line: the moral is (the morals are):

  1. Don't write while(!feof(stream)).
  2. Do use feof() and ferror() only to test why a previous input call failed.
  3. Do check the return value of scanf and fscanf.

And we might also note: Do beware of files not ending in newline! They can behave surprisingly differently.


Addendum: Here's a better way to write the loop:

while((r = fscanf(stream,"%s",buffer)) == 1) {
    printf("%s\n", buffer);
}

When you run this, it always prints exactly the strings it sees in the input. It doesn't repeat anything; it doesn't do anything significantly differently depending on whether the last line does or doesn't end in a newline. And -- significantly -- it doesn't (need to) call feof() at all!


Footnote: In all of this I've ignored the fact that %s with *scanf reads strings, not lines. Also that %s tends to behave very badly if it encounters a string that's larger than the buffer that's to receive it.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • Good article! This question required an in depth explanation. Note however that your description of the `%s` conversion does not address the skipping of the whitespace characters before the word. If there are such characters but end-of-file occurs after reading them, the C Standard is not completely clear as to what should happen to the end-of-file indicator. Similarly, if converting `%d`, what happens to a stream that has just a trailing `-` or `+` before end-of-file? – chqrlie Jun 17 '18 at 18:41
  • @chqrlie Right -- now go back and read my first paragraph. :-) – Steve Summit Jun 17 '18 at 18:44
  • Yes, that's my understanding too, `fscanf()` and friend are full of quirks and special cases, far too subtle to comprehend: even after implementing it several times, I still discover new corner cases for which the C Standard is not completely clear. – chqrlie Jun 17 '18 at 18:47
1

Both of your loops are incorrect: feof(f) is only set after an unsuccessful attempt to read past the end of file. In your code, you do not test for fgetc() returning EOF nor if fscanf() returns 0 or EOF.

Indeed fscanf() can set the end of file condition of a stream if it reaches the end of file, which it does for %s if the file does not contain a trailing newline, whereas fgets() would not set this condition if the file ends with a newline. fgetc() sets the condition only when it returns EOF.

Here is a modified version of your code that illustrates this behavior:

#include <stdio.h>

int main() {
    FILE *fp = stdin;
    char buf[100];
    char *p;
    int c, n, eof;

    for (;;) {
       c = fgetc(fp);
       eof = feof(fp);
       if (c == EOF) {
           printf("c=EOF, feof()=%d\n", eof);
           break;
       } else {
           printf("c=%d, feof()=%d\n", c, eof);
       }
    }

    rewind(fp); /* clears end-of-file and error indicators */
    for (;;) {
        n = fscanf(fp, "%99s", buf);
        eof = feof(fp);
        if (n == 1) {
            printf("fscanf() returned 1, buf=\"%s\", feof()=%d\n", buf, eof);
        } else {
            printf("fscanf() returned %d, feof()=%d\n", n, eof);
            break;
        }
    }

    rewind(fp); /* clears end-of-file and error indicators */
    for (;;) {
        p = fgets(buf, sizeof buf, fp);
        eof = feof(fp);
        if (p == buf) {
            printf("fgets() returned buf, buf=\"%s\", feof()=%d\n", buf, eof);
        } else
        if (p == NULL) {
            printf("fscanf() returned NULL, feof()=%d\n", eof);
            break;
        } else {
            printf("fscanf() returned %p, buf=%p, feof()=%d\n", (void*)p, (void*)buf, eof);
            break;
        }
    }
    return 0;
}

When run with standard input redirected from a file containing Hello world without a trailing newline, here is the output:

c=72, feof()=0
c=101, feof()=0
c=108, feof()=0
c=108, feof()=0
c=111, feof()=0
c=32, feof()=0
c=119, feof()=0
c=111, feof()=0
c=114, feof()=0
c=108, feof()=0
c=100, feof()=0
c=EOF, feof()=1
fscanf() returned 1, buf="Hello", feof()=0
fscanf() returned 1, buf="world", feof()=1
fscanf() returned -1, feof()=1
fgets() returned buf, buf="Hello world", feof()=1
fscanf() returned NULL, feof()=1

The C Standard specifies the behavior of the stream functions in terms of individual calls to fgetc, fgetc sets the end of file condition when it cannot read a byte from the stream at end of file.

The behavior illustrated above conforms to the Standard and shows how testing feof() is not a good approach to validate input operations. feof() can return non-zero after successful operations and can return 0 before unsuccessful operations. feof() is should only be used to distinguish end of file from input error after an unsuccessful input operation. Very few programs make this distinction, hence feof() is almost never used on purpose and almost always indicates a programming error. For extra explanations, read this: Why is “while ( !feof (file) )” always wrong?

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • One other case where `feof()` is -- very occasionally and just barely -- useful is when you're reading a group of related characters -- perhaps a record in a file -- using individual calls to `getc` or the like, and rather than laboriously testing every return value for EOF, you call `feof()` once at the end, and if it returns true, discard or invalidate everything you just (thought you) read. – Steve Summit Jun 17 '18 at 17:33
  • @SteveSummit: that's interesting but risky and error prone: it only works if the last read operation was an `fgetc()` (or a `getc()` or a `getchar()`). Most other input operation may set the end of file condition and still be successful. If you want to use this approach, just compare the return value of the last `fgetc()` to `EOF`. – chqrlie Jun 17 '18 at 18:32
1

If I might offer a tl;dr to both the comprehensive answers here, formatted input reads characters until it has reason to stop. Since you say

The last character in that file is not a newline character

and the %s directive reads a string of non-whitespace characters, after it reads the ! in World! it has to read another character. There isn't one, which lights eof.

Put whitespace (space, newline, whatever) at the end of the phrase, and your printf will print the last word twice: once because it read it, and again because the scanf failed to find a string to read before hitting eof, so the %s conversion never happened leaving the buffer untouched.

jthill
  • 55,082
  • 5
  • 77
  • 137