fgets implementation from the std library of K&R 2nd edition

Question

I'm aware that code snippets from textbooks are just for demonstration purposes and shouldn't be held to production standards but K&R 2nd edition on page 164-165 says:

fgets and fputs, here they are, copied from the standard library on our system:

char *fgets(char *s, int n, FILE *iop)
{
    register int c;
    register char *cs;
    
    cs = s;
    while (--n > 0 && (c = getc(iop)) != EOF)
        if ((*cs++ = c) == '\n')
            break;
    *cs = '\0';
    return (c == EOF && cs == s) ? NULL : s;
}

Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s; since:

ANSI C89 standard says:

If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.

Even the standard library illustrated in Appendix B of the book says so. From Page 247:

fgets returns s, or NULL if end of file or error occurs.

K&R uses ferror(iop) in fputs implementation given just below this fgets implementation on the same page.

With the above implementation, fgets will return s even if there is a read error after reading some characters. Maybe this is an oversight or am I missing something?

A big red flag is the use of the deprecated `register` keyword. Additionally, `getc()` will `"return "EOF on end of file or error."` — David C. Rankin, Apr 06 '22 at 19:12
The behavior of `register` is not defined by the C standard. It is *"implementation defined"* (behavior is defined by the implementation not the standard) and taken to mean use a storage class that is "as fast as possible". Early C it was used to tell the compiler to hold the value in a CPU register only making access fast. However, the downside is a variable stored only in a register does not have an ADDRESS and any code that attempts to take the address of a `register` variable invokes undefined behavior. — David C. Rankin, Apr 07 '22 at 00:54
That's correct. Even K&R says so in the book and in the Appendix A Reference Manual page 210. However, I still don't understand, in the context of `fgets` implementation `register` is big red flag for what? They are not taking address of `c` or `cs` anywhere in the code. — rootkea, Apr 07 '22 at 01:01
It was intended to mean seeing `register` used at all immediately indicates the code is old. It indicates special scrutiny should be used in evaluating the code -- even if it comes from the early authors of C. — David C. Rankin, Apr 07 '22 at 01:17
@DavidC.Rankin Ah, I think you missed the point of the post. The purpose of the question is not to hold K&R `fgets` implementation to modern ("new") standards but to the K&R ("old") itself. (See point 2. from the question) — rootkea, Apr 07 '22 at 03:56
Sometimes my interpretation crystal-ball gets a bit dodgy... `:)` — David C. Rankin, Apr 07 '22 at 06:33
@DavidC.Rankin: The only aspect of `register` that is implementation-defined is the extent to which access to a `register` object is made as fast as possible. Use of `register` does not otherwise change the semantics of an identifier or its object except that its address should not be taken. — Eric Postpischil, Apr 07 '22 at 15:24

Andreas Wenzel · Accepted Answer · 2022-04-07T16:14:21.533

You are correct that the behavior of the posted implementation of the function fgets does not comply with the C89 standard. For the same reasons, it also does not comply with the modern C11/C18 standard.

The posted implementation of fgets handles end-of-file correctly, by only returning NULL if not a single character has been read. However, it does not handle a stream error correctly. According to your quote of the C89 standard (which is identical to the C11/C18 standard in this respect), the function should always return NULL if an error occurred, regardless of the number of characters read. Therefore, the posted implementation of fgets is wrong to handle an end-of-file in exactly the same way as a stream error.

It is worth noting that the second edition of K&R is from 1988, which is from before the ANSI C89 standard was published. Therefore, the exact wording of the standard may not have been finalized yet, when the second edition of the book was written.

The posted implementation of fgets also does not comply with the quoted specification of Appendix B. Assuming that the function fgets is supposed to behave as specified in Appendix B, then the posted implementation of fgets handles errors correctly, but it does not handle end-of-file correctly. According to the quote from Appendix B, the function should always return NULL when an end-of-file occurs (even if characters have been successfully read, which is not meaningful).

It is also worth noting that using the statement

return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;

as suggested in the question will not make the implementation of the function fgets fully comply with the C89/C11/C18 standards. When a stream error occurs "during the operation", the function is supposed to return NULL. However, when ferror returns nonzero, it may be impossible to tell whether the error occurred "during the operation", i.e. whether the stream's error indicator was already set before the function fgets was called. It is possible that the stream's error indicator was already set due to an error that occurred before fgets was called, but that all subsequent stream operations succeeded or failed due to end-of-file (i.e. not due to stream error). The function fgets is also not allowed to simply call clearerr at the start of the function in order to distinguish these cases, because it would then have to restore the state of the stream's error indicator before returning. Setting the stream's error indicator is not possible in the C standard library; it would require an implementation-specific function. Looking at the return value of getc will not always be able to resolve this ambiguity, because a return value of EOF can mean both end-of-file or error.

I think the implementation passes muster as `getc()` returns `EOF` on end-of-file OR error. So it would generate `NULL` on stream error. (if I'm thinking right...) — David C. Rankin, Apr 06 '22 at 19:15
@DavidC.Rankin: You are correct that the name `EOF` is misleading. It does not necessarily mean end-of-file; it can also mean stream error. However, the posted implementation will incorrectly return `s` when a stream error is encountered **after reading at least one character**. In such a case, it should return `NULL` instead. This is stated quite clearly in both standards. — Andreas Wenzel, Apr 06 '22 at 19:26
Thank you Andreas, so I wasn't thinking right. You are entirely correct with the `&&` in the return -- the stream error is masked if at least one valid character has been read. (I see this has troubled some of our veterans as well `:)` — David C. Rankin, Apr 06 '22 at 19:40
Let's forget about the C89 standard. As mentioned in the original post, the appendix B at the end of K&R second edition (page 247) says "`fgets` returns `s`, or `NULL` if end of file or **error** occurs." So if not C89 then K&R is at least violating function specification from standard library given in the same book. — rootkea, Apr 07 '22 at 00:24
@rootkea: Yes, you are correct that the posted `fgets` implementation does not comply with your quoted description from Appendix B, either. Therefore, assuming that your quotes are correct and complete, the book is self-contradictory in this respect. Also, the quoted description in Appendix B does not comply with the ANSI C89 standard either, because, according to the standard, the function `fgets` should not always return `NULL` on end-of-file. On end-of-file, it should return `NULL` only if `0` characters have been read. Otherwise, it should return `s`. — Andreas Wenzel, Apr 07 '22 at 00:41
I think the important point here if you want to distinguish, you need to call `feof()` and if `EOF` isn't set, check `ferror()`. In the case of this implementation, you would have to check `ferror()` every time and confirm zero to ensure a stream error didn't occur. — David C. Rankin, Apr 07 '22 at 01:14
@DavidC.Rankin: I don't think that calling `feof` or `ferror` is a reliable solution, because it is possible that the end-of-file indicator or error indicator were set before the function `fgets` was called. Unfortunately, it is not possible for the function `fgets` to simply call `clearerr` at the start of the function in order to distinguish these cases, because it would then have to restore the state of both indicators, which is not possible without an implementation-specific function. I have added a paragraph to my answer to address this issue. — Andreas Wenzel, Apr 07 '22 at 16:06

chux - Reinstate Monica · Answer 2 · 2022-04-07T02:27:21.410

0

Why is the return statement not return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;

Because this is not the best option either.

return (c == EOF && cs == s) ? NULL : s; well handles the case of EOF due to end-of-file, but not EOF due to input error.

ferror(iop) is true when an input error just occurred or when an input error had occurred before. It is possible that fgetc() can return non-EOF after returning EOF due to input error. This differs from feof(), which is sticky. Once EOF occurs due to end-of-file, feof() continues to return EOF, unless the end-of-file flag is cleared.

A better alternative would be to insure an EOF just occurred rather than use an unqualified ferror().

 if (c == EOF) {
   if (feof(iop)) {
     if (cs == s) return NULL;
   } else {
     return NULL; // Input error just occurred.
   }
 }
 return s;

Pedantic n

Pathological cases: The below suffers when n <= 0 as *cs = '\0' writes into cs[] outside its legal range. --n is a problem when n == INT_MIN.

while (--n > 0 && (c = getc(iop)) != EOF) // Is --n OK?
  if ((*cs++ = c) == '\n')
    break;
*cs = '\0';  // Is n big enough

// Alternative
while (n > 1 && (c = getc(iop)) != EOF) {
  n--;
  *cs++ = c;
  if (c == '\n') {
    break;
  }
}   
if (n > 0) *cs = '\0';

Pedantic CHAR_MAX >= INT_MAX

To note: On rare machines (some old graphics processors), returning an EOF may be a valid CHAR_MAX. Alternative code not presented.

Uninitialized c

Testing c == EOF is UB as it is not certain c was ever set with a small n.

Better to initialize C: register int c = 0;

More:
What are all the reasons fgetc() might return EOF?.
Is fgets() returning NULL with a short buffer compliant?.

edited Apr 07 '22 at 02:27

answered Apr 07 '22 at 01:09

chux - Reinstate Monica

143,097
13
135
256

I don't follow. If `fgetc()` returns `EOF` due to input error then the loop breaks so how can `fgetc` can return "non-EOF after returning EOF"? – rootkea Apr 07 '22 at 04:02
@rootkea A call to `fgetc()` before this `fgets()` may have returned `EOF` due to input error. IOWs, input error flag is set before this call function `fgets()`. Testing for `ferror(iop)` does not reflect what happened in `fgets()` and `return (ferror(iop) || (c == EOF && cs == s)) ? NULL : s;` retruns `NULL` when it should not. – chux - Reinstate Monica Apr 07 '22 at 04:11
@rootkea `fgetc()` does not return `EOF` because the error flag is set. `fgetc()` returns `EOF` when an input error just occurred. In the later case, the error flag will then be set. – chux - Reinstate Monica Apr 07 '22 at 04:14
I think this is a misplaced concern. What happens before calling `fgets()` is programmer's responsibility. So let's just assume everything is well and good when we call this `fgets()`. Indeed, that's why K&R uses `return ferror(iop) ? EOF : 0;` in `fputs` implementation just down the same page. (See point 3. from the question) – rootkea Apr 07 '22 at 04:18
Regarding `n` being not valid - I had already noticed it but didn't press it because the purpose of the post is to hold K&R to itself. Its Appendix B standard library talks about returning `NULL` from `fgets` on input error but says nothing about `n`. So let's just assume it's a valid `n`. i.e. `n > 1` – rootkea Apr 07 '22 at 04:31
@rootkea "assume everything is well and good when we call this fgets()" --> it is not necessary to assume the error flag is cleared to create a good compliant function. Rather than "assume it's a valid n. i.e. n > 1", `fgets()` should still function without UB with `n == 1`. [glibc](https://stackoverflow.com/a/23389531/2410359) fixed that for a reason. It comes down to how well one wants to handle edge and near edge cases. Your call. – chux - Reinstate Monica Apr 07 '22 at 04:47
I agree. But the point of this post is not to code production-ready `fgets` but to hold K&R `fgets` implementation to itself. That is to make sure that the given implementation at least respects what it says in Appendix B standard library. This is more of an academic question than production question. – rootkea Apr 07 '22 at 04:52

fgets implementation from the std library of K&R 2nd edition

2 Answers2