I know the following code is broken --getchar()
returns an int
not a char
--
Good!
char single_byte = getchar();
This is problematic is more than one way.
I'll assume CHAR_BIT == 8
and EOF == -1
. (We know EOF
is negative and of type int
; -1
is a typical value -- and in fact I've never heard of it having any other value.)
Plain char
may be either signed or unsigned.
If it's unsigned, the value of single_byte
will be either the value of the character that was just read (represented as an unsigned char
and trivially converted to plain char
), or the result of converting EOF
to char
. Typically EOF
is -1, and the result of the conversion will be CHAR_MAX
, or 255. You won't be able to distinguish between EOF
and an actual input value of 255 -- and since /dev/urandom
returns all byte values with equal probability (and never runs dry), you'll see a 0xff
byte sooner or later.
But that won't terminate your input loop. Your comparison (single_byte == EOF)
will never be true; since single_byte
is of an unsigned type in this scenario, it can never be equal to EOF
. You'll have an infinite loop, even when reading from a finite file rather than from an unlimited device like /dev/urandom
. (You could have written (single_byte == (char)EOF)
, but of course that would not solve the underlying problem.)
Since your loop does terminate, we can conclude that plain char
is signed on your system.
If plain char
is signed, things are a little more complicated. If you read a character in the range 0..127, its value will be stored in single_byte
. If you read a character in the range 128..255, the int
value is converted to char
; since char
is signed and the value is out of range, the result of the conversion is implementation-defined. For most implementations, that conversion will map 128 to -128, 129 to -127, ... 255 to -1. If getchar()
returns EOF
, which is (typically) -1, the conversion is well defined and yields -1. So again, you can't distinguish between EOF
and an input character with the value -1
.
(Actually, as of C99, the conversion can also raise an implementation-defined signal. Fortunately, as far as I know, no implementations actually do that.)
if (single_byte == EOF)
printf("EOF is implemented in terms of 0x%x.\n", single_byte);
Again, this condition will be true either if getchar()
actually returned EOF
or if you just read a character with the value 0xff
. The %x
format requires an argument of type unsigned int
. single_byte
is of type char
, which will almost certainly be promoted to int
. Now you can print an int
value with an unsigned int
format if the value is within the representable range of both types. But since single_byte
's value is -1
(it just compared equal to EOF
), it's not in that range. printf
, with the "%x"
format, assumes that the argument is of type unsigned int
(this isn't a conversion). And 0xffffffff
is the likely result of taking a 32-bit int
value of -1
and assuming that it's really an unsigned int
.
And I'll just note that storing the result of getchar()
in an int
object would have been a whole lot easier than analyzing what happens when you store it in a char
.