0

Here is a code fragment from "The C Programming Language" by Kernighan and Ritchie.

#include <stdio.h>
/* copy input to output; 2nd version */
main()
{
   int c;
   while ((c = getchar()) != EOF) {
       putchar(c);
}

Justification for using int c instead of char c:

...We can't use char since c must be big enough to hold EOF in addition to any possible char.

I think using int instead of char is only justifiable if c is modified with unsigned because an signed char won't be able to hold the value of EOF which is -1, and when I wrote this program char c was interpreted as signed char c, therefore I had no problem.

Were char variables previously unsigned by default? And if so, then why did they alter it?

  • *I think using int instead of char is only justifiable if c is modified with unsigned because an signed char won't be able to hold the value of EOF which is -1.* It has nothing to do with signed or unsigned character. If `getchar()` returned a `char` instead of an `int`, there would be no difference between the `char` with a value of `-1` or `EOF`. – Andrew Henle Sep 06 '20 at 12:57

4 Answers4

1

The C Standard does not define whether or not char is signed or unsigned, that's why we also have signed char and unsigned char. This has been the case in K&R C and is still the case in C18. But this is not really relevant when looking at your actual question, we simply need to use int here because we need a type that can hold more values than char so that we can use one of them to signal EOF.

Peter
  • 2,919
  • 1
  • 16
  • 35
  • But we can simply use char as signed char can hold EOF –  Sep 06 '20 at 12:39
  • 2
    @AritraChakrabary: The Standard does not specify (to my knowledge) that `EOF` _has_ to be `-1`. Of course it does make sense for it to be this way and reserve the return values `0-255` for valid characters. But this is entirely independent from whether or not `char` is signed or unsigned. If it was signed, it still couldn't represent more than 256 values. – Peter Sep 06 '20 at 12:43
  • @peter a char can hold more than 256 values. it is just that it cannot hold *all possible* char values *and* distinct EOF – Antti Haapala -- Слава Україні Sep 06 '20 at 14:20
  • @AnttiHaapala: How would you propose to represent more then 256 distinct values using one byte? (I'm aware that a the C standard permits non-8-bit bytes but that is not exactly common). – Peter Sep 06 '20 at 14:31
  • @Peter I was thinking EOF is -1 but i was wrong; from that perspective though what i said was correct. A signed char will be able to hold EOF as its range will be -127 to 127 –  Sep 08 '20 at 16:12
1

I think using int instead of char is only justifiable if c is modified with unsigned because a signed char won't be able to hold the value of EOF which is -1.

Who says that EOF is -1? It is specified to be negative, but it doesn't have to be -1.

In any case, you're missing the point. Signedness notwithstanding, getchar() needs to return a type that can represent more values than char can, because it needs to provide, in one way or another, for every char value, plus at least one value that is distinguishable from all the others, for use as EOF.

Were char variables previously unsigned by default? And if so, then why did they alter it?

No. But in C89 and pre-standard C, functions could be called without having first been declared, and the expected return type in such cases was int. This is among the reasons that so many of the standard library functions return int.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Can you please specify what is the range of EOF, then? –  Sep 06 '20 at 12:55
  • @AritraChakrabary, the standard specifies only that `EOF` expands to an integer constant expression with type `int` and a negative value. Assuming anything more specific about it reduces the portability of your program. – John Bollinger Sep 06 '20 at 13:00
0

Values and representations aside, the main reason we don’t use char to represent EOF is that it isn’t a character - it is a condition signaled by the input function when it reaches the end of the stream. It is logically a different entity from a character value.

John Bode
  • 119,563
  • 19
  • 122
  • 198
0

Why int is discussed here.

But what comes to char being unsigned I believe is more common among architectures, and there are more devices in existence that have those architecture. But as one architecture (x86-16/32) has been popular among newbie programmers, that is often neglected. x86 is peculiar in that it has the move with sign extension and signed memory operands, which is why the signed char makes more sense there, whereas on majority of other architectures unsigned char is the more efficient one.