About storing getchar() returned value inside a char-variable

Question

I know the following code is broken --getchar() returns an int not a char--

#include <stdio.h>
int
main(int argc, char* argv[])
{
  char single_byte = getchar();

  while (single_byte != EOF) {
    single_byte = getchar();
    printf("getchar() != EOF is %d.\n", single_byte != EOF);
    if (single_byte == EOF)
      printf("EOF is implemented in terms of 0x%x.\n", single_byte);
  }

  return 0;
}

though I would expect that a typical output of it (using /dev/urandom as the input-stream for instance) would have been at last EOF is implemented in terms of 0xff, and not the following

$ ./silly < /dev/urandom
getchar() != EOF is 1.
getchar() != EOF is 1.
// ...
getchar() != EOF is 0
EOF is implemented in terms of 0xffffffff.

Furthermore, 0xffffffff cannot be stored into a single byte ...

Thank you in advance

score 2 · Accepted Answer · answered Sep 26 '13 at 15:00

I know the following code is broken --getchar() returns an int not a char--

Good!

char single_byte = getchar();

This is problematic is more than one way.

I'll assume CHAR_BIT == 8 and EOF == -1. (We know EOF is negative and of type int; -1 is a typical value -- and in fact I've never heard of it having any other value.)

Plain char may be either signed or unsigned.

If it's unsigned, the value of single_byte will be either the value of the character that was just read (represented as an unsigned char and trivially converted to plain char), or the result of converting EOF to char. Typically EOF is -1, and the result of the conversion will be CHAR_MAX, or 255. You won't be able to distinguish between EOF and an actual input value of 255 -- and since /dev/urandom returns all byte values with equal probability (and never runs dry), you'll see a 0xff byte sooner or later.

But that won't terminate your input loop. Your comparison (single_byte == EOF) will never be true; since single_byte is of an unsigned type in this scenario, it can never be equal to EOF. You'll have an infinite loop, even when reading from a finite file rather than from an unlimited device like /dev/urandom. (You could have written (single_byte == (char)EOF), but of course that would not solve the underlying problem.)

Since your loop does terminate, we can conclude that plain char is signed on your system.

If plain char is signed, things are a little more complicated. If you read a character in the range 0..127, its value will be stored in single_byte. If you read a character in the range 128..255, the int value is converted to char; since char is signed and the value is out of range, the result of the conversion is implementation-defined. For most implementations, that conversion will map 128 to -128, 129 to -127, ... 255 to -1. If getchar() returns EOF, which is (typically) -1, the conversion is well defined and yields -1. So again, you can't distinguish between EOF and an input character with the value -1.

(Actually, as of C99, the conversion can also raise an implementation-defined signal. Fortunately, as far as I know, no implementations actually do that.)

if (single_byte == EOF)
    printf("EOF is implemented in terms of 0x%x.\n", single_byte);

Again, this condition will be true either if getchar() actually returned EOF or if you just read a character with the value 0xff. The %x format requires an argument of type unsigned int. single_byte is of type char, which will almost certainly be promoted to int. Now you can print an int value with an unsigned int format if the value is within the representable range of both types. But since single_byte's value is -1 (it just compared equal to EOF), it's not in that range. printf, with the "%x" format, assumes that the argument is of type unsigned int (this isn't a conversion). And 0xffffffff is the likely result of taking a 32-bit int value of -1 and assuming that it's really an unsigned int.

And I'll just note that storing the result of getchar() in an int object would have been a whole lot easier than analyzing what happens when you store it in a char.

score 0 · Answer 2 · answered Sep 26 '13 at 12:34

0

End-of-File is a macro definition of type int that expands into a negative integral constant expression (generally, -1).

answered Sep 26 '13 at 12:34

Sadique

22,572
7
65
91

Sure, but after all I'm just printing the value of `single_byte` which is a `char`. I'm asking: how `single_byte` (which is **1** byte indeed) can be `0xffffffff`? – Giuseppe Crinò Sep 26 '13 at 12:43
Did you try printing `EOF`? http://ideone.com/wHcIq8 - What you need to find out is what `EOF` evaluates to in your system. You will get your answer. – Sadique Sep 26 '13 at 12:47
My perplexity is thus caused by the way I'm printing `single_byte`? Meaning that `%x` prints `-1` no matters where it comes from, yet using 2's complement on `1` thought as a 32bit word?, right? At first `single_byte` is evaluated then printed out the way instruction asks. – Giuseppe Crinò Sep 26 '13 at 13:07
For unsigned integer types (size_t being one of those), the C standard specifies precise overflow rules. In short, if SIZE_MAX is the maximum value of the type size_t, then the expression (size_t) (SIZE_MAX + 1) is guaranteed to be 0, and therefore, you can be sure that (size_t) -1 is equal to SIZE_MAX. – Sadique Sep 26 '13 at 13:22

score 0 · Answer 3 · answered Sep 26 '13 at 13:49

EOF is not a real character so in order to allow the result of getchar() return either a valid character or an EOF, it uses a hack whereas the return type is int. You have to cast it to char after you make sure it is not an EOF.

This is a textbook example of poorly designed API.

chux - Reinstate Monica · Answer 4 · 2013-09-26T14:07:01.343

It appears to be a confusion between (char) -1 and (int) -1.

getchar() returns an int with 1 of 257 different values: 0 to 255 and EOF. EOF is less than 0 (C11 7.21.1).

Typically EOF has the value of -1 and that is so in your case. Let's assume that for the following.

From time to time, when data is read from /dev/urandom, a value of 255 is read. This is not the EOF.

Given that OP performs char single_byte = getchar(), single_byte takes on the same value of (char) -1 if (int) -1 (EOF) was read or if (int) 255 was read.

When next comparing single_byte != EOF, should the result be false, we do not know if original return value of getchar() was -1 or 255.

Recommend a different printf()

printf("single_byte==EOF, so (int) 255 or EOF was read: 0x%hhx\n", single_byte);

Assumptions:
char is 8 bits.
EOF is -1.

Sure, I knew that; that's why OP is broken: there are two bytes that overlap then the procedure stops after reading a random number of bytes. — Giuseppe Crinò, Sep 26 '13 at 14:50

score 0 · Answer 5 · answered Sep 26 '13 at 14:11

0

EOF values are
EOF => %d => -1
EOF => %c => <prints blank space but not blank space>
EOF => %x => 0xFFFFFFFF

no ascii value for EOF! so basically you cannot compare the getchar() output with EOF. Reason is when you leave blank space and press enter ASCII value of a blank space is 0x20 (32 in decimal), If you press enter then ASCII of carriage return in 0x0D (13 in decimal).

So that piece of code will not work! either you have to define a value to exit the code!

answered Sep 26 '13 at 14:11

Basavaraju B V

174
1
8

uhm ...so you're saying that EX1-6 in K&R ('Verify that expression `getchar() != EOF` is `0` or `1`') is bad-stated? – Giuseppe Crinò Sep 26 '13 at 14:45
I am sorry! your piece of code is an infinite loop! when I typed your piece of code never work meant It will never reach that if (single_byte == EOF). – Basavaraju B V Sep 26 '13 at 17:25
I am sorry! your piece of code is an infinite loop! when I typed your piece of code never work meant It will never reach that if (single_byte == EOF), I should have made it clear my mistake. And EX1-6 in K&R is not bad stated! It is an infinite loop until you actually break the execution of the program. in that example value of c is always 1 (which is always != 0). – Basavaraju B V Sep 26 '13 at 17:32

About storing getchar() returned value inside a char-variable

5 Answers5

Linked