0

I know this has been discussed before, but I want to make sure I understand correctly, what is happening in this program, and why. On page 20 of Dennis Ritchie's textbook, The C Programming Language, we see this program:

#include <stdio.h>

int main()
{

int c;

c = getchar();

while(c != EOF){
    putchar(c);
    c = getchar();
}

return 0;

}

When executed, the program reads each character keyed in and prints them out in the same order after the user hits enter. This process is repeated indefinitely unless the user manually exits out of the console. The sequence of events is as follows:

  1. The getchar() function reads the first character keyed in and assigns its value to c.

  2. Because c is an integer type, the character value that getchar() passed to c is promoted to it's corresponding ASCII integer value.

  3. Now that c has been initialized to some integer value, the while loop can test to see if that value equals the End-Of-File character. Because the EOF character has a macro value of -1, and because none of the characters that are possible to key in have a negative decimal ASCII value, the condition of the while loop will always be true.

  4. Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.

  5. The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop. If the user only keys in one character before execution, then the program reads the <return> value as the next character and prints a new line and waits for the next input to be keyed in.

Is any of this remotely correct?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Nicholas Cousar
  • 205
  • 1
  • 8
  • 4
    There is no such thing as "EOF character". EOF, as you point out, is the integer value -1, which is different from any possible character value. getchar() returns an int, which is either a character or EOF, which terminates the program. How you cause the "EOF" condition on your console is OS-dependent. Of course, if your input is redirected from a file, then it will terminate naturally with an actual EOF condition. – Lee Daniel Crocker Sep 10 '18 at 23:58
  • Is that why c has to be declared as an integer and not a character? So that the expression c != EOF can be evaluated? – Nicholas Cousar Sep 11 '18 at 00:03
  • 1
    The think with `after the user hits enter` is related to standard input buffering. It's line buffered, what the user inputs is stored in a buffer by the operating system and flushed to the program only after the user types enter (or EOF). – KamilCuk Sep 11 '18 at 00:05
  • 1. yes, 2.no, 3.no, 4. yes, 5. yes – M.M Sep 11 '18 at 00:37
  • Note that at the terminal, you can type a character which can end up being interpreted as indicating EOF. On Unix, that's normally Control-D; on Windows, Control-Z. However, that character is not EOF; it can appear in a file and (on Unix, at least) it is simply another valid character. When typed at the terminal, the terminal driver makes any waiting input available to the program(s) reading from the terminal. If there's no data waiting, then it indicates 0 bytes are available, and that is what triggers `getchar()` etc to treat it as EOF. You reach EOF in a regular file when read returns 0. – Jonathan Leffler Sep 11 '18 at 00:39

2 Answers2

1

Yes, you've basically got it. But it's even simpler: getchar and putchar return and accept int types respectively already. So there's no type promotion happening. You're just taking in characters and sending them out in a loop until you see EOF.

Your intuition about why those should be int and not some char form is likely correct: the int type allows for a sentinel EOF value that is outside the value range of any possible character value.

(The K&R stdio functions are very old at this point, they don't know about Unicode and etc, and some of the underlying design rationales are if not murky, just not relevant. Not a lot of practical code these days would use these functions. That book is excellent for a lot of things but the code examples are fairly archaic.)

(Also, fwiw, your question title refers to "copying a file", which you still can do this way, but there are more canonical ways)

Ben Zotto
  • 70,108
  • 23
  • 141
  • 204
  • Is there a way to see the value EOF printed after all user input has been read, or some of kind of function that outputs -1 when you are done entering the inputs? – Nicholas Cousar Sep 11 '18 at 00:16
  • @NicholasCousar: I don't quite follow-- in what context do you mean? `getchar` should indeed spit out `EOF` (-1) when it's done sending you its input. The way that it understands "input is done" depends on how it's getting input. If you're piping it in from the command line, the shell will indicate the termination for you. If you're letting it sit there accepting keyboard typed input, you'll have to manually tell it you're done inputting, see this question and answers (incl the comments on them) which may be helpful https://stackoverflow.com/questions/21364313/signal-eof-in-mac-osx-terminal – Ben Zotto Sep 11 '18 at 00:19
  • EOF is not a thing, it's an event. getchar() notifies you of that event by returning -1. – Lee Daniel Crocker Sep 11 '18 at 00:26
0

Well, it is correct in idea, but not in details, and that's where the devil is in.

  • The getchar() function reads the first character from standard input and returns it as an unsigned char promoted to int (or the special EOF value if no character was read)

  • The return value is assigned into c, which is of type int (as it should, as if it were a char strange things could happen)

  • Now that c has been assigned some integer value, the while loop can test to see if that value equals the value of the EOF macro.

  • Because the EOF macro has an implementation-specified negative value, and because the characters were converted to unsigned char and promoted to int, none of them have a negative value (at least not in any systems that you'd meet a a novice), the condition of the while loop will always be true until the End-of-File condition happens or an error happens when reading standard input.

  • Once the program verifies that c != EOF is true, the putchar() function is called, which outputs the character value contained in c.

  • The getchar() is called again so it reads the next input character and passes its value back to the start of the while loop.

  • The standard input, if it is connected to a terminal device, is usually line-buffered, meaning that the program does not receive any of the characters on the line until the user has completed the line and hit the Enter key.

Instead of ASCII, we speak of the execution character set, which nowadays might often be individual bytes of UTF-8 encoded Unicode characters. EOF is negative in binary too, we do not need to think about "its decimal value". The char and unsigned char types are numbers too, and the character constants are of type int - i.e. on systems where the execution character set is compatible with ASCII, writing ' ' will be the same thing as writing 32, though of course clearer to those who don't remember ASCII codes.

Finally, C is very strict about the meaning of initialization. It is the setting of the initial value into a variable when it is declared.

int c = getchar();

has an initialization.

int c;
c = getchar();

has c uninitialized, and then assigned a value. Knowing the distinction makes it easier to understand compiler error messages when they refer to initialization or assignment.