10

According to the C standard, any characters returned by fgetc are returned in the form of unsigned char values, "converted to an int" (that quote comes from the C standard, stating that there is indeed a conversion).

When sizeof (int) == 1, many unsigned char values are outside of range. It is thus possible that some of those unsigned char values might end up being converted to an int value (the result of the conversion being "implementation-defined or an implementation-defined signal is raised") of EOF, which would be returned despite the file not actually being in an erroneous or end-of-file state.

I was surprised to find that such an implementation actually exists. The TMS320C55x CCS manual documents UCHAR_MAX having a corresponding value of 65535, INT_MAX having 32767, fputs and fopen supporting binary mode... What's even more surprising is that it seems to describe the environment as a fully conforming, complete implementation (minus signals).

The C55x C/C++ compiler fully conforms to the ISO C standard as defined by the ISO specification ...

The compiler tools come with a complete runtime library. All library functions conform to the ISO C library standard. ...

Is such an implementation that can return a value indicating errors where there are none, really fully conforming? Could this justify using feof and ferror in the condition section of a loop (as hideous as that seems)? For example, while ((c = fgetc(stdin)) != EOF || !(feof(stdin) || ferror(stdin))) { ... }

autistic
  • 1
  • 3
  • 35
  • 80
  • 3
    @BLUEPIXY A "byte" in standardese is whatever a `char` is. It's not necessarily 8 bits. – T.C. Jun 15 '15 at 01:55
  • I am confused, if `sizeof(int)` is `1`, how come `INT_MAX` is `32767`? that value requires two 8 bit bytes. And indeed, a byte might be more than 8 bits, hence the `CHAR_BIT` macro is used to determine that. – Iharob Al Asimi Jun 15 '15 at 01:57
  • 3
    @iharob This system doesn't use 8-bit bytes. –  Jun 15 '15 at 01:59
  • @duskwuff so it is able to return an error value without probelms, right? – Iharob Al Asimi Jun 15 '15 at 02:00
  • @t.c I seems to have misunderstood the meaning of the "Byte" in C standard. – BLUEPIXY Jun 15 '15 at 02:04
  • The real worry should be whether `fgetc` can read all values that are representable by `int`. If not, then any value that can never be read can be used as `EOF`. For example, does anyone guarantee that `fputc` and `fgetc` are roundtrippable on the platform in question? – Kerrek SB Jun 15 '15 at 02:05
  • @KerrekSB `fputc` and `fgetc` aren't the only ones... Consider `fwrite` and `fgetc`, `fputs` and `fgetc`, `fprintf` and `fgetc`... There are many functions that *do* support writing in binary mode, but `fgetc` can't in this case, without violating the return value clause. – autistic Jun 15 '15 at 02:25
  • In the older question, see [this answer](http://stackoverflow.com/a/3867416/153285): The TI C55x is not conforming because it uses "narrow" binary streams. The accepted answer essentially says that conforming behavior exists, but it's not popular. – Potatoswatter Jun 15 '15 at 02:26
  • @Potatoswatter That question (and answer) concerns only one device, which is used as an example in this question, but this question is not solely about that device. Additionally, it has no citations (and is thus questionable at best), and the width of the character set is irrelevant when the file is open using binary mode. That answer is wrong for this question. – autistic Jun 15 '15 at 02:35
  • 1
    @undefinedbehaviour That question has plenty of citations and it is not device-specific. Its selected answer is not device-specific. The significance of the width of the character set is addressed there too. I only mentioned one particular non-selected answer because you mentioned C55x. If you need additional info, ask a new question referencing that Q&A. As far as I can tell, it answers your question in the affirmative: It is conforming for `fgetc` to return `EOF` in the middle of a file. – Potatoswatter Jun 15 '15 at 08:56

1 Answers1

2

The function fgetc() returns an int value in the range of unsigned char only when a proper character is read, otherwise it returns EOF which is a negative value of type int.

My original answer (I changed it) assumed that there was an integer conversion to int, but this is not the case, since actually the function fgetc() is already returning a value of type int.

I think that, to be conforming, the implementation have to make fgetc() to return nonnegative values in the range of int, unless EOF is returned.

In this way, the range of values from 32768 to 65535 will be never associated to character codes in the TMS320C55x implementation.

pablo1977
  • 4,281
  • 1
  • 15
  • 41
  • Props for seeing the issue here. That particular implementation may be violating contract of `fgetc` either by returning a negative value when it shouldn't, or not supporting binary files correctly... – autistic Jun 15 '15 at 02:22
  • From TI's documents: [*On targets where `sizeof(char) == sizeof(int)` (C2700, C2800, C5400, C5500), you still can't reliably use the return value of `getc()` to check for end of file, because 0xffff will be mistaken for the end of file. Use `feof()` instead*](http://processors.wiki.ti.com/index.php/C89_Support_in_TI_Compilers#Misunderstandings_about_TI_C) – phuclv Jun 26 '18 at 02:17