12

Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:

#include <stdio.h>

int main(void) {
    char buf[4096];
    size_t charsread;
    while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
        printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
    }
    printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
    return 0;
}

When compiling and running the above program exactly as-is, here's a timeline of events:

  1. My program calls fread.
  2. fread calls the read system call.
  3. I type "asdf".
  4. I press Enter.
  5. The read system call returns 5.
  6. fread calls the read system call again.
  7. I press Ctrl+D.
  8. The read system call returns 0.
  9. fread returns 5.
  10. My program prints Read 5 bytes. EOF: 1. Error: 0.
  11. My program calls fread again.
  12. fread calls the read system call.
  13. I press Ctrl+D again.
  14. The read system call returns 0.
  15. fread returns 0.
  16. My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.

Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?

UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. @Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).

Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?

  • I don't see anythihng there that would require two EOFs. Is there any code inside the loop that calls `fread()`? – Barmar Oct 05 '18 at 22:39
  • @Barmar Nope. The code literally as I pasted it, with an empty loop body, requires two EOF's when I compile and run it. – Joseph Sible-Reinstate Monica Oct 05 '18 at 22:40
  • Have you tried running it under a debugger to see what `charsread` is? – Barmar Oct 05 '18 at 22:41
  • 1
    At minimum, see [Canonical vs non-canonical terminal input](https://stackoverflow.com/questions/358342/canonical-vs-non-canonical-terminal-input). That mentions that hitting the 'EOF' indicator key makes all the buffered input available to `read()`. If there's no buffered input, it makes zero bytes available, and zero bytes read indicates EOF. – Jonathan Leffler Oct 05 '18 at 22:42
  • 2
    @JonathanLeffler That explains why you have to type Ctl-D at the beginning of a line to signal EOF. But it doesn't explain why he has to do it twice. – Barmar Oct 05 '18 at 22:43
  • I can't reproduce the problem. I compiled this program on Linux and Mac OS, and in both I only had to type Ctl-d once after Enter. – Barmar Oct 05 '18 at 22:46
  • @Barmar I edited a timeline of events into the question. – Joseph Sible-Reinstate Monica Oct 05 '18 at 22:49
  • 1
    @Barmar One other important detail: you need to type some input rather than Ctrl+D immediately, or it works fine. I'll add that too. – Joseph Sible-Reinstate Monica Oct 05 '18 at 22:49
  • I did type some input first, I still couldn't reproduce it. – Barmar Oct 05 '18 at 22:50
  • I also only needed to type Ctl-d twice in the middle of the line, not 3 times. – Barmar Oct 05 '18 at 22:51
  • 2
    Oops, I wasn't on Linux when I thought I was testing there. It works correctly on MacOS, but I see the same thing as you on Linux. – Barmar Oct 05 '18 at 22:55
  • 1
    Interesting. strace shows the system call `read` is returning zero twice. Still looking at GNU libc source.... – aschepler Oct 05 '18 at 23:01
  • @aschepler It's definitely libc related. See my latest edit. – Joseph Sible-Reinstate Monica Oct 05 '18 at 23:02
  • 2
    It's an artifact of the linux implementation, and how the tty works. The first CTRL+D sends the asdf\n up to your program, but CTRL+D does not actually close stdin. fread() continues and the read() syscall blocks since stdin is not really closed. fread() decides to give up on the next CTRL+D as read() returned 0 and nothing was present in its internal buffer. – nos Oct 05 '18 at 23:04
  • 1
    @nos But the same Linux kernel and thus the same tty implementation, but different libc versions, make a difference. – Joseph Sible-Reinstate Monica Oct 05 '18 at 23:05
  • This has to do with the order of the `1` and the `sizeof(buf)` arguments. If you put them the other way round, it doesn't happen. Interesting puzzle to work out what is happening underneath though... – Graeme Oct 05 '18 at 23:05
  • I think it would work correctly if you used `fgets()` to read line by line. – Barmar Oct 05 '18 at 23:05
  • 1
    @JosephSible There's notes in the NEWS file for glibc 2.28 on how stdio functions treat EOF, so that could be related. "All stdio functions now treat end-of-file as a sticky condition. .... It is most likely to affect programs that use stdio to read interactive input from a terminal." – nos Oct 05 '18 at 23:09
  • @Graeme There's a different problem if you swap the order. I put a `printf()` in the loop body, and when I swapped the order I got no output at all. The program just terminated when I typed Ctl-d at the beginning of the second line. – Barmar Oct 05 '18 at 23:14
  • Does it block on `echo hello | ./a.out` ? -->it is a terminal thing,or an atifact. – wildplasser Oct 05 '18 at 23:16
  • @wildplasser It won't block on a pipe or file input, because EOF is a permanent condition. – Barmar Oct 05 '18 at 23:17
  • 1
    Terminals are special, you can keep reading after EOF is reached. Actually, files also allow this, but something has to write to the file before you call `read()` again, and that won't normally happen unless you delay the loop (that's how tail -f works). – Barmar Oct 05 '18 at 23:18

1 Answers1

6

I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:

The byte input/output functions — those functions described in this subclause that perform input/output: [...], fread

The byte input functions read characters from the stream as if by successive calls to the fgetc function.

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.

(emphasis on "or" mine)

The following program demonstrates the bug with fgetc:

#include <stdio.h>

int main(void) {
    while(fgetc(stdin) != EOF) {
        puts("Read and discarded a character from stdin");
    }
    puts("fgetc(stdin) returned EOF");
    if(!feof(stdin)) {
        /* Included only for completeness. Doesn't occur in my testing. */
        puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
        return 1;
    }
    if(fgetc(stdin) != EOF) {
        /* This happens with glibc in my testing. */
        puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
        return 1;
    }
    /* This happens with musl in my testing. */
    puts("No standard violation detected");
    return 0;
}

To demonstrate the bug:

  1. Compile the program and execute it
  2. Press Ctrl+D
  3. Press Enter

The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.

Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.

  • 1
    Unfortunately, this bug fix causes regressions in other software, for example [cups-filters](https://github.com/OpenPrinting/cups-filters/issues/58). But we have decided to [keep the fix, at least for now](https://sourceware.org/bugzilla/show_bug.cgi?id=23636). – Florian Weimer Oct 06 '18 at 08:01
  • 1
    Yes, this is a very very old, well-known bug glibc inherited from a bug in sysv unix. Most other implementations these days lack the bug, so any software broken by the fix in glibc is also going to be broken on most non-glibc (e.g. BSD) systems. – R.. GitHub STOP HELPING ICE Oct 06 '18 at 13:35
  • 1
    Conversely, softwares such as `hexdump` are broken by the old GNU C library behaviour and work with other C libraries. https://unix.stackexchange.com/q/517064/5132 – JdeBP May 06 '19 at 02:34