-1
int c;

while ((c = getchar()) != EOF)
    putchar(c);

"This value is called EOF, for "end of file". We must declare c to be a type big enough to hold EOF in addition to any possible char. Therefore we use int."

Correct me if i am wrong:

  • (signed) char = [-128. +127]
  • unsigned char = [0, 255]
  • EOF = -1

when I replace int with char in the above program it seems to work like intended , but after some research I found out that it doesn't because the variable c cannot store -1 aka EOF ( albeit using char ).

I run it anyway and tried to crash it, I tried to input negative number like -1 but it didn't work. I believe that is because it is interpreted like 2 different characters - and 1. I tried ÿ which is the character corresponding to ascii value 255 according to http://ascii-code.com/, so for what input will the above program ( using char instead of int) crash ?

(For information, I am using a 64bit fedora Linux)

spirosbax
  • 15
  • 6
  • Why do you think it will crash? – melpomene Jul 29 '16 at 14:25
  • `Correct me if i am wrong:` - you are wrong. Swap the `signed` and `unsigned` ranges. And as Olaf said. – Eugene Sh. Jul 29 '16 at 14:38
  • `SCHAR_MIN`, the lower bound of `signed char`, is typically `-128`, not `-127`. That's for 2's-complement implementations with `CHAR_BIT==8`. Other values are possible. Plain `char` may be either signed or unsigned; it has the same range and representation as either `signed char` or `unsigned char`, but it's still a distinct type. – Keith Thompson Jul 29 '16 at 14:40
  • You're wrong about `char` and `unsigned char` - the possible values of `unsigned char` are 0 - 255, while (signed) `char` can hold values of -128 through 127. – Bob Jarvis - Слава Україні Jul 29 '16 at 14:41
  • @KeithThompson: If we swap the `signed` and `unsigned` ranges, these are the minimum ranges. Still not sure what the problem is for OP. – too honest for this site Jul 29 '16 at 14:42
  • `ÿ` is not the character with the value `-1` in most encodings. This only applies to ISO 8859-1 which isn't commonly used today. – fuz Jul 29 '16 at 15:17

2 Answers2

3

It has been explained in other answers before, but sometimes it is harder to find the duplicate than to give the answer.

The plain char type can be signed or unsigned.

The function getchar() returns either EOF or …obtains that character as an unsigned char converted to an int (quoting the standard for fgetc(), but it applies to getchar() too).

If you have an unsigned plain char type, then the assignment will generate a value 0..255 which will then be promoted to int for the comparison with EOF, and since none of the values 0..255 is negative, the test will always fail — and the loop won't stop until you terminate the program by some other means (interrupt, reboot, …).

If you have a signed plain char type, then the assignment will treat both one valid character (often ÿ — U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS, if you are using a single-byte code set such as ISO 8859-15) and EOF as marking EOF, so the loop may terminate prematurely on some files.

So, depending on the machine, the loop:

char c;

while ((c = getchar()) != EOF)
    ;

may either be an infinite loop or it may terminate before EOF for some data files. Neither is correct behaviour — and neither behaviour is a crash. (The code in the question won't crash.) Changing the type of c to int fixes both problems reliably and portably.

Note that if you are working with a UTF-8 locale, you will not generate the hex 0xFF byte; that is not a valid byte in UTF-8 (U+00FF is encoded as two bytes 0xC3 0xBF in UTF-8).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    Very nice answer. Another benefit of using `int c` involves subsequent use of `is...()` functions like `isspace()` which expects a value in the same range of `unsigned char`/`EOF`. Code can use `isspace(c)`. With `char c`, code should use `isspace((unsigned char)c)` to avoid UB. – chux - Reinstate Monica Jul 29 '16 at 16:12
-1

The reason why it may crash that in C char isn't specified to be signed or unsigned. It can work well on your machine, but on it can fail on others. And also getchar() function return int value, so you should use int variable to get this returning value.

Mikhail
  • 17
  • 3