According to the standard (C17 draft, 7.21.7.2), fgets
(¶1)
char *fgets(char * restrict s, int n, FILE * restrict stream);
reads from stream
at most n-1
characters (until the first '\n'
(which is in this case also written to the target) or EOF
) into s[]
, appending a '\0'
(¶2). It return
s (¶3):
NULL
:- if
EOF
is encountered immediately (s[]
remains unchanged) - if there was a read error (
s[]
has indeterminate contents)
- if
s
: otherwise ("success")
I would therefore expect that for n <= 1
, fgets
reads "at most" n-1 <= 0
(that is: 0
) characters, appending a '\0'
, and returning s
. In any case, there is nothing being read, so the program can't read EOF
or have any read errors.
However, with the following code
#include <stdio.h>
#include <string.h>
int main(void) {
char s[20];
char *cp;
int n;
for (n = 2; n >= -1; --n) {
strcpy(s, "HHHHH");
cp = fgets(s, n, stdin);
printf("n == %d:\n", n);
printf(" \"%s\"\n", s);
if (cp == NULL)
printf(" fgets returned NULL\n");
}
printf("The end of main has been reached.\n");
return 0;
}
and input abcde
, I get the following output with GCC
n == 2:
"a"
n == 1:
""
n == 0:
"HHHHH"
fgets returned NULL
n == -1:
"HHHHH"
fgets returned NULL
The end of main has been reached.
and the following output with MSVC
n == 2:
"a"
n == 1:
""
n == 0:
"HHHHH"
fgets returned NULL
I am guessing that the abrupt program termination with MSVC has to do with the invocation of an "invalid parameter handler" (see Microsoft's documentation for fgets
).
For the n == 1
case, the output is as expected. But: Shouldn't fgets(s, n, stream)
assign and return an empty string ""
instead of NULL
for all n <= 1
instead of just for n == 1
? Irrespective of what to make of the n == -1
case, both GCC and MSVC return NULL
for n == 0
.
For what it's worth, the precise wording of ¶3 is:
"[...] If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. [...]"
But if nothing is being read in the first place, how can end-of-file be "encountered" in that case?
My conclusions, after having read the comments and the content in the duplicate post:
I am convinced now that reading "at most"
n-1
characters isn't possible forn <= 0
. One can only read a non-negative number of characters (which I interpret to mean: a number of characters in the range [0
,SIZE_MAX
]).Therefore, in the case of
n <= 0
, the standard's text invokes what linguists in their subfield of semantics call a presupposition failure.- A presupposition is an assumption whose negation renders the containing statement uninterpretable. Natural language examples: "the" in a sentence presupposes contextual uniqueness; "we" in a sentence presupposes 2 or more people on whose behalf the subject is speaking; "stopped doing X" in a sentence presupposes that one indeed "was doing X for a while".
- That is, in this case "all bets are off". However, even though the presupposition failure makes this case literally "undefined behavior", I would be more comfortable if we just called the standard out on this omission, because after all it doesn't outlaw an argument of
n
which is<= 0
. (As user "chux" pointed out in a comment (paraphrased): UB comes in 2 flavors, that which is explicitly specified as UB and that about which the standard is silent; both types are common in C.)
The case of
n == 1
looks well-formed to me (one can read "at most0
" characters). I find the wordingA null character is written immediately after the last character read into the array.
unproblematic, because not having a "last character read" is expected for the boundary condition of
0
read operations. (That is, in this case the presupposition failure can be tolerated, because it is just "one away" from there being a last character read.)- That said, the wording lacks clarity, and the standard should improve it.
- To make things dependent on the stream's
EOF
flag (herefeof(stdin)
) is intriguing, but I think this goes too far in trying to assign meaning to something which is poorly worded in the standard.
That the C2x draft (I'm looking at N3096; there might be newer versions at the point of this writing) still contains the same underspecified language is a disappointment.
I believe that there are 4 potential ways we can consider handling the n <= 0
case, given that it's not outlawed:
- setting
s[]
to""
andreturn
ings
- setting
s[]
to""
andreturn
ingNULL
- leaving
s[]
unchanged andreturn
ings
- leaving
s[]
unchanged andreturn
ingNULL
Given the existing confusion around the case, I will stay away from discussing their relative merits and consistency with other parts of the standard.