2

The C Standard defines EOF and WEOF with the following language:

7.21.1 Input/output<stdio.h> - Introduction

The header <stdio.h> defines several macros, and declares three types and many functions for performing input and output.

...

EOF

which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

...

7.21.1 Extended multibyte and wide character utilities <wchar.h> - Introduction:

The header <wchar.h> defines four macros, and declares four data types, one tag, and many functions.

...

wint_t

which is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set

WEOF

which expands to a constant expression of type wint_t whose value does not correspond to any member of the extended character set.(328) It is accepted (and returned) by several functions in this subclause to indicate end-of-file, that is, no more input from a stream. It is also used as a wide character value that does not correspond to any member of the extended character set.


  1. The value of the macro WEOF may differ from that of EOF and need not be negative.

EOF is a negative value and it is the only negative value that getc() can return. I have seen it commonly defined as (-1), and similarly WEOF defined as ((wint_t)-1).

Are there any common C environments where either of these macros are defined to something different?

What is the rationale for the Standard Committee to leave open the possibility of different values and especially a non-negative value for WEOF?

phuclv
  • 37,963
  • 15
  • 156
  • 475
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    It also very much depends on what `wint_t` is defined as. If it is, for example, an `unsigned short` and `sizeof(int) > sizeof(unsigned short)`, then `EOF != WEOF` (it has to do with sign extension, or rather the lack thereof). – Some programmer dude Nov 21 '16 at 10:13
  • 2
    Why bother? Using the macros instead of magic numbers is good practice and makes the code clearer anyway. Re. `wint`: IIRC, Windows has 16 bit `wchar_t` which requires all 16 bits to represent characters. Allowing non-negative values for `WEOF` would allow to reserve a single code for the marker instead half of the available codespace. – too honest for this site Nov 21 '16 at 10:15
  • Hypothetically, an implementation might have some legitimate use for `-1` as a character code, so might want to pick a different value for `EOF`. – Ian Abbott Nov 21 '16 at 17:09

3 Answers3

3

What is the rationale for the Standard Committee to leave open the possibility of different values and especially a non-negative value for WEOF?

The type int is always signed, a negative value is always included in the range, thus the EOF macro can be defined by the standard as -1.

However the type wint_t may be signed or unsigned1, so the macro WEOF cannot be defined by the standard as a specific value. The implementation must choose it, since the implementation defines the type wint_t and its signess, it must also choose a value for WEOF.


1 (Quoted from: ISO/IEC 9899:201x 7.20.3 Limits of other integer types 5)
If wint_t (see 7.29) is defined as a signed integer type, the value of WINT_MIN shall be no greater than −32767 and the value of WINT_MAX shall be no less than 32767; otherwise, wint_t is defined as an unsigned integer type, and the value of WINT_MIN shall be 0 and the value of WINT_MAX shall be no less than 65535.

phuclv
  • 37,963
  • 15
  • 156
  • 475
2501
  • 25,460
  • 4
  • 47
  • 87
  • I would love to read constructive comments along with downvotes on a correct answer. – 2501 Nov 22 '16 at 07:52
  • I did not downvote, but it is IMHO quite misleading to define `wint_t` as an **unsigned** type: the name says otherwise. I thought the Rationale for `wint_t` was to have a possibly larger type to handle all values of `wchar_t` and some special value indicating the end of file. In the end, The whole wide char support is a mess: `wchar_t` can be signed, but `L'a'` is unsigned, `char16_t` and `char32_t` are unsigned, but `char` may signed and it is possible that `'\xFF' == EOF`... To top this bazaar of confusion, `WEOF` can be unsigned, different from `EOF`. – chqrlie Nov 23 '16 at 07:57
  • @chqrlie How is it my fault what the standard defines? – 2501 Nov 25 '16 at 09:26
  • I am not saying it is your fault, as a matter of fact, I upvoted your answer, I am just underlining the potential inconsistencies the Committee saw it fit to engrave into the Standard to reflect then current usage (I suppose). Consistency would demand that both `char` and `wchar_t` be unsigned and `wint_t` be signed with a larger range, and both `EOF` and `WEOF` could be `(-1)` and `(wint_t)(-1)`. – chqrlie Nov 25 '16 at 10:36
3

The value of -1 for EOF allows for simple efficient implementation of ctype macros (for the common case of small char, say 8 bits or so). A typical implementation may look like this:

unsigned __ctypes[257] = { 0 /* for EOF */, ... };

#define isalpha(c) (__ctypes[(c)+1] & _ALPHA_BITS)

There is no particular benefit in defining EOF as any other integer, so -1 is likely to be used in any sensible implementation with small char type.

For large wchar_t, the table would be too large, so wctype functions are likely to be implemented differently. Hence less incentive to give WEOF any particular value, including -1.

Nisse Engström
  • 4,738
  • 23
  • 27
  • 42
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
  • 1
    The problem with this implementation is the lack of support for signed chars. Although it is incorrect to invoke `isalpha` with a `char` argument if this `char` can be negative, it is a **very** common mistake. glibc's implementation uses an array of 384 entries to handle both signed and unsigned char values. On such an implementation, it would make sense to define `EOF` as `(-129)`. Similarly, defining `WEOF` as `((wint_t)(-1))` avoids unnecessary harshness on programmers that mistakenly use `((c = getc(f)) != EOF)` on wide streams. – chqrlie Nov 23 '16 at 07:37
1

In Xinu OS EOF is defined as -2. See Actual implementation of EOF different from -1

OTOH wint_t can be an unsigned type, so there are many actual implementations where WEOF != -1. For example in MSVC wint_t is unsigned short and WEOF is (wint_t)(0xFFFF). Technically U+FFFF isn't a valid Unicode character so it can be used for WEOF, just like -1 being used for EOF in implementations where sizeof(char) == sizeof(int). See also

phuclv
  • 37,963
  • 15
  • 156
  • 475