wint_t
to wchar_t
is the same as what int
to char
, therefore an implementation where sizeof(wchar_t) == sizeof(wint_t)
is completely legal, just as implementations where sizeof(int) == sizeof(char)
are allowed. In fact for the char
case it's even worse because you can't return a different type for getc
, fgetc
... whereas for wint_t
you can simply typedef it as a wider type if necessary. You can also see that the standard even explicitly permits it
Footnote 327) wchar_t
and wint_t
can be the same integer type.
http://www.iso-9899.info/n1570.html#7.29.1
The standard also said that "The values WCHAR_MIN and WCHAR_MAX do not necessarily correspond to members of the extended character set" and there's nothing wrong with that. The extended character set range can be smaller than wchar_t
range because the same happens in char
. For example if the basic character set is ASCII then it uses only half of the available range (or much less if CHAR_BIT > 8
). wint_t
is
... an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);
http://www.iso-9899.info/n1570.html#6.3.1.3
so presumably its size may be even smaller than wchar_t
if the extended character set is much smaller than the wchar_t
set. Since 0xFFFF is guaranteed not to be a Unicode character at all, using it for WEOF
is completely valid, although it's a little bit weird IMHO and I don't know why MS did that
If sizeof(wchar_t) == sizeof(wint_t)
or sizeof(int) == sizeof(char)
then there are also values that char
and wchar_t
can represent but int
and wint_t
can't in case char
/wchar_t
is unsigned. In that case the conversion between them is implementation defined. That won't be any issues if you're working on text files although it'll cause problems if you're reading binary files. Anyway in that case for portability you need to explicitly test for EOF and error yourself
int c;
while((c = /* fgetwc(in) */ fgetc(in)) != EOF || (!feof(in) && !ferror(in)))
fputc(c, out);
This is the same as what TI suggested
On targets where sizeof(char)==sizeof(int)
(C2700, C2800, C5400, C5500), you still can't reliably use the return value of getc()
to check for end of file, because 0xffff will be mistaken for the end of file. Use feof()
instead.
CMU's FIO34-C. Distinguish between characters read from a file and EOF
or WEOF
also said that
Because EOF
is negative, it should not match any unsigned character value. However, this is only true for implementations where the int
type is wider than char
. On an implementation where int
and char
have the same width, a character-reading function can read and return a valid character that has the same bit-pattern as EOF
. This could occur, for example, if an attacker inserted a value that looked like EOF into the file or data stream to alter the behavior of the program.
The C Standard requires only that the int type be able to represent a maximum value of +32767 and that a char type be no larger than an int. Although uncommon, this situation can result in the integer constant expression EOF being indistinguishable from a valid character; that is, (int)(unsigned char)65535 == -1
. Consequently, failing to use feof()
and ferror()
to detect end-of-file and file errors can result in incorrectly identifying the EOF character on rare implementations where sizeof(int) == sizeof(char)
.
This problem is much more common when reading wide characters. The fgetwc(
), getwc()
, and getwchar()
functions return a value of type wint_t
. This value can represent the next wide character read, or it can represent WEOF
, which indicates end-of-file for wide character streams. On most implementations, the wchar_t
type has the same width as wint_t
, and these functions can return a character indistinguishable from WEOF
.
In the UTF-16 character set, 0xFFFF is guaranteed not to be a character, which allows WEOF
to be represented as the value -1. Similarly, all UTF-32 characters are positive when viewed as a signed 32-bit integer. All widely used character sets are designed with at least one value that does not represent a character. Consequently, it would require a custom character set designed without consideration of the C programming language for this problem to occur with wide characters or with ordinary characters that are as wide as int
.
See also