0

I'm reading Programming in C - A Tutorial by Brian Kernighan and on page 5 he suggests that

main( ) {
    char c;
    while( (c=getchar( )) != ′\0′ )
        putchar(c);
}

can be simplified to

main( ) {
    while( putchar(getchar( )) != ′\0′ ) ;
}

with the only difference being that the last '\0' is printed in the 2nd one.

However, when I compile this, replacing '\0' with EOF, and pass a string: $ printf "abc" | ./a.out, the program goes into an infinite loop printing ASCII 0xff characters.

If I change it to while( putchar(getchar( )) != 'd' ) ; and run $ printf "abcde" | ./a.out, it successfully prints up to and including d and then exits.

Why does it go into an endless loop instead of printing abc(EOF) and exiting?

florosus
  • 25
  • 5
  • Dude - "EOF" <> "\0'! If you look in stdio.h, you'll see it's "-1". Q: Does everything work if you just copy the examples verbatim? – paulsm4 Dec 13 '21 at 23:59
  • I tried it with both '\0' and -1 (no quotes). As soon as it hits EOF (^D, end of file descriptior, end of string, etc), it starts spamming 0xff. – florosus Dec 14 '21 at 00:04
  • Alternative code: `int main( ) { int c; while( (c=getchar( )) != EOF ) putchar(c); } }` – chux - Reinstate Monica Dec 14 '21 at 00:35

3 Answers3

3

According to §7.21.7.8 of the ISO C11 standard, the function putchar will convert its int argument to an unsigned char and write that character, and also return that character (not the original unconverted int argument).

Calling putchar( EOF ) will therefore return (unsigned char)EOF, which cannot be equal to EOF (because EOF is required to be a negative value).

For this reason, in the line

while( putchar( getchar() ) != EOF )

the loop condition will always be true, so that you will have an infinite loop.

Note that EOF is not an actual character, but simply a special macro constant to indicate an error or end-of-file condition.

Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
3

EOF is a macro with a negative value that is implementation dependent (very commonly -1). If you check out man page for putchar(3) you can read that putchar() writes the character provided as argument, cast to an unsigned char to stdout and returns the character written as an unsigned char cast to an int or EOF on error.

When getchar() returns EOF at the end of the stdin, putchar() casts this value to unsigned char, writes it to the stdout and returns the unsigned char value cast to an int. Therefore if EOF is for example equal to -1, putchar() in your example will return 255 (-1 cast to unsigned char, then cast to int). If you replaced '\0' with EOF in the while loop, you will get an infinite loop because 255 != EOF.

trdo
  • 101
  • 5
2

That's a very old tutorial; you can tell because it doesn't specify a return type for main. But the real problem with it is that it loops until getchar returns 0 (also known as '\0'), but in fact getchar will return EOF at the end of the file, and that is equal to -1 on my system (and possibly yours too).

Apparently, back when that tutorial was written getchar would return 0 at the end of the file. That was not very good behavior though, because it makes it tricky to process binary data, which usually contains many 0 bytes. At some point the language was changed, and getchar was changed to return an int instead of a char, and to return EOF at the end of the file.

David Grayson
  • 84,103
  • 24
  • 152
  • 189
  • 2
    There is a secondary problem: putchar(getchar()) != EOF only happens if putchar() hits an error -- otherwise it masks its argument down to a character size, so when getchar() returns `-1`, putchar() returns 0xff. – mevets Dec 14 '21 at 00:08
  • @mevets That fixed it! I checked `!= 0xff` and it printed a single 0xff and quit. – florosus Dec 14 '21 at 00:09
  • That didn't fix it; that is the bug; and is why the program *has* to be: `int c; while ((c=getchar()) != EOF) putchar(c);` Note the `int` and the purposeful avoidance of printing a `0xff`. – mevets Dec 14 '21 at 00:11
  • I'm guessing there's no way to just print a clean EOF instead of 0xff using this nested expression, then. – florosus Dec 14 '21 at 00:12
  • `EOF` is not a character (unlike `DOS, CPM` systems where `^Z` marked `EOF`). EOF is a condition of the data source you are reading from. When it is a file, it is because you have already read the file. When it is a `tty`-like device, it is when the user enters a character that the device-management software interprets as and end-of-input. In `unix` derived systems, that defaults to `^D`, but can be set to any character via `stty` or `tcsetattr`. – mevets Dec 14 '21 at 00:18