3
#include <stdio.h>

int main()
{
    int c = getchar();

    while (c != EOF) {
        putchar(c);
        c = getchar();
    }

    return 0;
}

The problem is distinguishing the end of input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF, for ``end of file''. We must declare c to be a type big enough to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.

From 'The C Programming Language' book. I have three questions. Firstly, why do I get the output ^\Quit (core dumped) when I press the keys ctrl and 4 simultaneously while the above program runs? I'm using a GNU/Linux machine.

Secondly, I wrote a program like this :

#include <stdio.h>

int main()
{
    printf("The part before EOF\n");
    putchar(EOF);
    printf("The part after EOF\n");
}

Then compiled this as 'eof.out' and changed int c = getchar(); in the program from the book into char c = getchar();, saved it and then compiler the program as 'copy.out'. When I run the command ./eof.out | ./copy.out in the terminal the output I get is :

The part before EOF

Meaning the program 'copy.out' worked correctly since it didn't print the second printf but the passage above from the book indicates that there should've been some kind of failure since I changed the int into char so what happened?

Thirdly, when I change the char c = getchar(); into double c = getchar(); and run the command ./eof.out | ./copy.out the output I get is :

The part before EOF
�The part after EOF

Why didn't putchar(EOF); stop copy.out ? Doesn't a double have more bytes than both int and char? what is happening?

hansoko
  • 155
  • 9
  • See [this](https://stackoverflow.com/questions/850163/how-can-one-send-a-ctrl-break-to-a-running-linux-process) and [this](https://unix.stackexchange.com/questions/226327/what-does-ctrl4-and-ctrl-do-in-bash) regarding Ctrl + 4. It's basically equivalent to Ctrl + Break in Windows. – Super-intelligent Shade Jul 30 '23 at 14:39
  • 1
    `putchar(EOF)` writes a single byte, which is (almost certainly, but may vary depending on platform) 0xff. `getchar` reads that byte as the integer 255, and `double y = getchar()` thus assigned the value 255 to y. But 255 != EOF. – William Pursell Jul 30 '23 at 14:49
  • Re “the passage above from the book indicates that there should've been some kind of failure since I changed the `int` into `char`”: The passage from the book does not say there should be some kind of failure. It says `EOF` is “a value that cannot be confused with any real character”; it does not say you cannot convert `EOF` to a `char`. If your C implementation uses an unsigned `char` type, the conversion wraps the value modulo 2^N, where N is the number of bits in a `char`, usually eight, so modulo 256. For example, −1 maps to 255… – Eric Postpischil Jul 30 '23 at 14:49
  • … If your C implementation uses a signed `char`, the conversion is implementation-defined. – Eric Postpischil Jul 30 '23 at 14:49
  • ASCII code points range 0..127. On most (all?) platforms `char` can hold values -128..127. On most (all?) platforms EOF is defined as -1. So, `char` will work just fine in your program. – Super-intelligent Shade Jul 30 '23 at 14:58
  • 2
    @EricPostpischil, the book is technically wrong if you take its claim out of context, or if you interpret "cannot" as precluding programmer error. But when that claim is understood as a statement about the behavior of `getchar()`, it is absolutely correct. If called when there is in fact a character available to return, `getchar()` / `getc()` / `fgetc()` return that character *as an `unsigned char`* (converted to `int`), whereas `EOF` is guaranteed to expand to a negative integer. The former is always distinguishable from the latter. – John Bollinger Jul 30 '23 at 15:03
  • 2
    Your three questions are not really related to each other, so they should probably be in separate posts. The first one in particular is about your operating system's terminal interface and doesn't have anything to do with the C programming language itself. – Nate Eldredge Jul 30 '23 at 15:07
  • @JohnBollinger: This was discussed on Stack Overflow years ago. In common C implementations, `EOF` cannot be confused with a character return from `getchar`. But if `char` and `int` are the same width, e.g., both 16 bits, the `unsigned char` that `getchar` returns is automatically converted to its `int` return type, so 65535 would be converted to −1 (assuming wrapping), and it would not be possible to distinguish the character 65535 from the `EOF` value solely by the `getchar` return value. (Testing `feof` could do it.) – Eric Postpischil Jul 30 '23 at 15:16
  • @JohnBollinger: See [here](https://stackoverflow.com/questions/3860943/can-sizeofint-ever-be-1-on-a-hosted-implementation) and [here](https://stackoverflow.com/questions/8134054/what-is-the-output-of-fgetc-under-the-special-case-where-int-width-char-bit). – Eric Postpischil Jul 30 '23 at 16:54
  • Acknowledged, @EricPostpischil. I do think that this is a largely a theoretical possibility, not a practical one, because even if `char` is the same size as `int` on some implementation -- as driven, most likely, by the target architecture -- that implementation does not need to recognize *characters* corresponding to all values in the range of type `char`. Indeed, this issue is a good reason for such an implementation not to do. But you're right, in principle, it could happen. – John Bollinger Jul 30 '23 at 18:44
  • `double c = getchar();` - what madness is this? – Paul Sanders Jul 30 '23 at 18:50

1 Answers1

5

getchar and putchar work with unsigned char values, not char values, so declaring c to be the char type causes a valid character 255 to be confused with EOF.

To simplify explanation, this answer assumes a common C implementation, except where stated: char is signed and eight bits, EOF is −1, and conversions to signed integer types modulo 2w, where w is the width of the type, in bits. The C standard permits some variations here, but these assumptions are typical in common C implementations and match the behavior reported in the question.

Consider this code for eof.c from the question:

#include <stdio.h>

int main()
{
    printf("The part before EOF\n");
    putchar(EOF);
    printf("The part after EOF\n");
}

When this program executes putchar(EOF), what happens is:

  • putchar converts EOF to unsigned char. This is specified in C 2018 7.21.7.3 (by way of 7.21.7.7 and 7.21.7.8).
  • Converting −1 to unsigned char yields 255, because conversion to an unsigned eight-bit integer type wraps modulo 256, and −1 + 256 = 255.
  • The character code 255 is written to standard output.

… changed int c = getchar(); in the program from the book into char c = getchar();, saved it and then compiler the program as 'copy.out'. When I run the command ./eof.out | ./copy.out in the terminal the output I get is :

The part before EOF

With c = getchar();, what happens when byte 255 is read and c = getchar() is evaluated is:

  • getchar returns 255. Note that it the character code as an unsigned char value, per C 2018 7.21.7.1 (by way of 7.21.7.5 and 7.21.7.6).
  • To assign 255 to c, 255 is converted to the char type. Per the assumption above, this wraps modulo 256, producing −1.

−1 is the value of EOF, so c != EOF is false, so the loop ends, and the program exits.

Why didn't putchar(EOF); stop copy.out ? Doesn't a double have more bytes than both int and char? what is happening?

With double c, the value assigned to c is the value returned from getchar; there is no change due to the destination type being unable to represent all the values getchar returns. When getchar returns the valid character code 255, c is set to 255, and the loop continues. When getchar returns the code −1 for end-of-file, c is set to −1, and the loop exits.

… the book indicates that there should've been some kind of failure since I changed the int into char

The passage from the book does not say there should be some kind of failure. It says EOF is “a value that cannot be confused with any real character”; it does not say you cannot convert EOF to a char. If your C implementation uses an unsigned char type, the conversion wraps the value modulo 2w, where w is the number of bits in a char, usually eight, so modulo 256. For example, −1 maps to 255. If your C implementation uses a signed char, the conversion is implementation-defined. So your eof.c program does not output an end-of-file indication when putchar(EOF) is evaluated. Instead, it outputs the character code 255.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312