10

When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by

if ((c = fgetc (stream)) != EOF)

where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.

But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...

Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...

EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)

Rémi Peyre
  • 410
  • 3
  • 12
  • I had already read it, but this does not answer my question... – Rémi Peyre Apr 30 '15 at 19:18
  • Multiple related questions: [What platforms have something other than 8-bit char?](http://stackoverflow.com/questions/2098149/), [Exotic architectures the standards committees care about](http://stackoverflow.com/questions/6971886), and [System where 1 byte != 8 bits](http://stackoverflow.com/questions/5516044) for a few. – Jonathan Leffler Apr 30 '15 at 20:02

4 Answers4

2

If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.

But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).

NOTE: as @Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as @pmg suggested.

Fabio Ceconello
  • 15,819
  • 5
  • 38
  • 51
  • Note that e.g. the test `0xFFFFFFFF == -1` is true for 32-bit `int`s though. The usual arithmetic conversions convert the -1 to an `unsigned int`. – Ulfalizer Apr 30 '15 at 19:39
  • @Ulfalizer, I meant the other way around. If fgetc finds a 0xFF byte to read, it'll be promoted to 0x000000FF (thus a positive 255), not 0xFFFFFFFF, because it is promoted as unsigned char. See an example of fgetc implementation here: http://mirror.fsf.org/pmon2000/3.x/src/lib/libc/fgetc.c – Fabio Ceconello Apr 30 '15 at 19:44
  • But C doesn't limit itself to ASCII only. – P.P Apr 30 '15 at 19:50
  • @FabioCeconello: If `char` and `int` have the same size, then you might end up with e.g. a 0xFFFFFFFF `char` value though. I guess the standard might imply(ish) in a few places that the value should be representable as a signed `int` though. Converting from unsigned to signed is undefined behavior anyway. – Ulfalizer Apr 30 '15 at 19:51
  • When the value does not fit in the signed type that is. – Ulfalizer Apr 30 '15 at 20:04
  • My understanding of the standard is that the only case where the size of int and char is equal is when both are 1. Int may be bigger, but not char. See http://stackoverflow.com/questions/2215445/are-there-machines-where-sizeofchar-1 but in those exotic places where sizeof(int)==1, you're right, because a 255 unsigned char will be back to a -1 1-byte int. – Fabio Ceconello Apr 30 '15 at 23:30
2

I think you need to rely on stream error.

ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;

From the standard

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.


Edit for better version

ch = fgetc(stream);
if (ch == EOF) {
    if (ferror(stream)) /* error reading */;
    else if (feof(stream)) /* end of file */;
    else /* read valid character with value equal to EOF */;
}
pmg
  • 106,608
  • 13
  • 126
  • 198
2

You asked:

Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?

On the contrary, the standard explicitly allows EOF to be returned when an error occurs.

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

In the footnotes, I see:

An end-of-file and a read error can be distinguished by use of the feof and ferror functions.

You also asked:

Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?

On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.

c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
  // Got valid input in c.
}
undur_gongor
  • 15,657
  • 5
  • 63
  • 75
R Sahu
  • 204,454
  • 14
  • 159
  • 270
1

I agree with your reading.

C Standard says (C11, 7.21.7.1 The fgetc function p3):

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.

ouah
  • 142,963
  • 15
  • 272
  • 331