confusion about int, char, and EOF in C

Question

I'm learning K&R's classic C programming book 2nd edition, here's an example on page 17:

#include <stdio.h>
/* copy input to output*/
main()
{
    int c; 
    // char c works as well!!
    while ((c = getchar()) != EOF)
        putchar(c);
}

it's stated in the book that int c is used to hold EOF, which turns out to be -1 in my Windows machine with GCC and can't be represented by char. However, when I tried char c it works with no problem. Curiously I tried some more:

int  a = EOF;
char b = EOF;
char e = -1;
printf("%d %d %d %c %c %c \n", a, b, e, a, b, e);

and the output is -1 -1 -1 with no character displayed (actually according to ASCII table for %c, c here there should be a nbs(no-break space) displayed but it's invisible).

So how can char be assigned with EOF without any compiler error?

Moreover, given that EOF is -1, are both b and e above assigned FF in memory? It should not be otherwise how can compiler distinguish EOF and nbs...?

Update:

most likely EOF 0xFFFFFFFF is cast to char 0xFF but in (c = getchar()) != EOF the the LHS 0xFF is int promoted to 0xFFFFFFFF before comparison so type of c can be either int or char.

In this case EOF happens to be 0xFFFFFFFF but theoretically EOF can be any value that requires more than 8 bits to correctly represent with left most bytes not necessarily being FFFFFF so then char c approach will fail.

Reference: K&R The C Programming Language 2e

seems to me like -1 fits just fine into a signed 8bit integer (char). can you post the entire statement? because `EOF can't be represented by char` seems wrong to me. also ASCII is only 0-127, nbs is part of *extended ascii* — x4rf41, Sep 22 '15 at 15:30
possible duplicate of [Using int for character types when comparing with EOF](http://stackoverflow.com/questions/8464030/using-int-for-character-types-when-comparing-with-eof) — cadaniluk, Sep 22 '15 at 15:32
You need some way of distinguishing between `0xFF` and `EOF`, otherwise it would be impossible for C to work with binary files that contain bytes with a value of `0xFF`. This is why functions like `getchar()` return integer values. — r3mainer, Sep 22 '15 at 15:32
The problem will come, when there is `0xFF` file data. Then as `char` you won't distinguish it from `EOF`. `getchar` returns `int` type for a very good reason. — Weather Vane, Sep 22 '15 at 15:33
Did you say "the output of printf("%d %d %d %c %c %c \n", a, b, e, a, b, e); is "-1 -1 -1"?? Edit: Oh, I guess you did -- the -1 chars didn't translate intot anything displayable. — Peter - Reinstate Monica, Sep 22 '15 at 15:47
"according to ASCII table for %c, c here there should be a nbs(no-break space)" is incorrect. There is no "nbs(no-break space)" in [ASCII](https://en.wikipedia.org/wiki/ASCII). Likely you are referencing some other table that has some ASCII similarities. And were is `c`? in your post? — chux - Reinstate Monica, Sep 22 '15 at 18:59
"most likely EOF 0xFFFFFFFF is cast to char 0xFF but in (c = getchar()) != EOF the the LHS 0xFF is int promoted to 0xFFFFFFFF before comparison so type of c can be either int or char." is quite messed up. Better to this in terms of values and not bit patterns. — chux - Reinstate Monica, Nov 01 '22 at 14:55

WedaPashi · Accepted Answer · 2015-09-22T15:55:26.567

5

EOF and 0xFF are not the same. So compiler has to distinguish between them. If you see the man page for getchar(), you'd know that it returns the character read as an unsigned char cast to an int or EOF on end of file or error.

Your while((c = getchar()) != EOF) is expanded to

((unsigned int)c != (unsigned int)EOF)

edited Sep 22 '15 at 15:55

answered Sep 22 '15 at 15:34

WedaPashi

3,561
26
42

Thanks! So in `char c = getchar()` is the returned `int` value truncated so only rightmost 8 bits are assigned to `c`? – mzoz Sep 22 '15 at 15:37
EOF is defined as int, it is not 0xff, it is actually 0xFFFFFFFF (-1 as 32bit int) (usually). but if you cast 0xFFFFFFFF to char it will be 0xFF. so yes – x4rf41 Sep 22 '15 at 15:40
Indirectly. Casting 0xFFFFFFFF to `char` would make it 0xFF. – WedaPashi Sep 22 '15 at 15:41
Alright so in `char c` and `(c = getchar()) != EOF` here `c` is first assigned with `0xFF` and then extended again to `0xFFFFFFFF` to compare with `EOF` so the whole thing works as `int c` am I right?? – mzoz Sep 22 '15 at 15:48
`c` is converted to `int` before comparison thanks to integer promotion rule. – WedaPashi Sep 22 '15 at 15:52
Hm. I do not think that anything is necessarily cast/converted to unsigned int. `getchar()` returns a signed int. Thus the assignment `c = getchar()` converts a signed int to a char (which may or may not be signed). The result of the assignment has type and value of the left side, i.e. type char, the value is "unchanged" as far as the bit pattern goes (because we are guaranteed to not have an overflow). Now `(c = getchar()) != EOF` is a comparison between char and int. – Peter - Reinstate Monica Sep 22 '15 at 16:17
... The conversions for the comparison now depend on the signedness of char. If char is signed, the left side of the comparison is promoted to signed int, which EOF already is. If char is unsigned I'm afraid both are promoted to unsigned int which makes EOF really big and a char with the value of 0xff 255, so that they would compare unequal, perhaps surprisingly, at least to me. Am I Wrong? – Peter - Reinstate Monica Sep 22 '15 at 16:18
@Peter: No, you are right about the conversions for comparison. Though I tend to disagree about the first statement wherein you said that nothing is cast/converted to unsigned int. The balancing rules of C implicitly convert the `int` to `unsigned int`, because each operand must have the same type? – WedaPashi Sep 22 '15 at 16:21
Both must have the same type, but none has unsigned int in the beginning. Certainly if char is signed there is no unsigned int promotion (why should there be one?). If char is unsigned I'm less certain -- the rules changed and it may be that even then the char will be promoted to int (since int can hold all unsigned char values). In that case too, the comparison would simply be performed between two ints. – Peter - Reinstate Monica Sep 22 '15 at 16:25
@Peter You are wrong about "If char is unsigned I'm afraid both are promoted to unsigned int ..." A `char` is promoted to `int` if `CHAR_MAX <= INT_MAX` which is almost _always_ the case. (Else it is promoted to `unsigned`.) This answers "is expanded to `((unsigned int)c != (unsigned int)EOF)`" is certainly wrong too. (I see the improvement in http://stackoverflow.com/questions/32720934/confusion-about-int-char-and-eof-in-c#comment53286564_32721076) – chux - Reinstate Monica Sep 22 '15 at 19:09
@chux-ReinstateMonica, do you mean it should be `signed int` instead of `unsigned int`? I had the same question as OP but haven't got to data types yet, so I'm not sure. Also, the link in your comment redirects to the comment before yours, did you mean to link to another post? – user51462 Nov 01 '22 at 07:01
@user51462 When `CHAR_MAX <= INT_MAX` (very common), `int c; ... while ((c = getchar()) != EOF)` is sufficient. Otherwise more code needed. – chux - Reinstate Monica Nov 01 '22 at 14:52

score 2 · Answer 2 · edited Sep 22 '15 at 16:06

2

This code works because you're using signed chars. If you look at an ASCII table you'll find two things: first, there are only 127 values. 127 takes seven bits to represent, and the top bit is the sign bit. Secondly, EOF is not in this table, so the OS is free to define it as it sees fit.

The assignment from char to int is allowed by the compiler because you're assigning from a small type to a larger type. int is guaranteed to be able to represent any value a char can represent.

Note also that 0xFF is equal to 255 when interpreted as an unsigned char and -1 when interpreted as a signed char:

0b11111111

However, when represented as a 32 bit integer, it looks very different:

255 : 0b00000000000000000000000011111111
-127: 0b11111111111111111111111110000001

edited Sep 22 '15 at 16:06

WedaPashi

3,561
26
42

answered Sep 22 '15 at 15:34

alexgolec

26,898
33
107
159

1

Hr-rmp, 0xFF is equal to 255. – Peter - Reinstate Monica Sep 22 '15 at 15:44
1

ok, i wont edit further without comment. 0xFF == 255 and 0xFF is -1 (as 8bit signed) not -127. can you fix that? – x4rf41 Sep 22 '15 at 15:45
i fixed it completely now :D. seeing that the last 8 bit are equal should really explain the problem – x4rf41 Sep 22 '15 at 15:47
Your edit defeats the purpose of my answer. I wanted to show that 255 and -127 are not equivalent in 32 bit 2s complement. -1 has no place here. – alexgolec Sep 22 '15 at 15:50
@Alex i think the problem is, that one has to understand that `(char)0xFF == -1` and `(int)0xFFFFFFFF == -1` are not equal, but if you cast `(char)0xFFFFFFFF` they are (bitwise). the `==` operator will return true though. but well, your post, your choice – x4rf41 Sep 22 '15 at 15:52

score 0 · Answer 3 · answered Nov 01 '22 at 15:14

Think values, not bit patterns.

Recall char is either signed or unsigned. char has the same range as signed char or unsigned char.

it's stated in the book that int c is used to hold EOF, which turns out to be -1 in my Windows machine with GCC and can't be represented by char.

So how can char be assigned with EOF without any compiler error?

"and can't be represented by char" is amiss. EOF is some negative int value, very often -1. When char is signed, char b = EOF; is just fine. It is like char b = -1; and b has the value of -1.

When char is unsigned, then char b = EOF; as part of the initialization, simply converts the value of EOF to the maximum char value and then assigns. b has the value of CHAR_MAX, which in this case is the same as UCHAR_MAX, often 255.

Deeper

getchar() returns an int in the unsigned char range or the negative EOF. This is true if char is signed or unsigned.

To well distinguish the typical 257 different possible return values of getchar(), save the result in an int.

confusion about int, char, and EOF in C

3 Answers3

Linked

Related