3

I am experiencing a strange problem with the the popen and fgets library functions on a Linux system.

A short program demonstrating the problem is below that:

  1. Installs a signal handler for SIGUSR1.
  2. Creates a secondary thread to repeatedly send SIGUSR1 to the main thread.
  3. In the main thread, repeatedly executes a very simple shell command via popen(), gets the output via fgets(), and checks to see if the output is of the expected length.

The output is unexpectedly truncated intermittently. Why?

Command-line invocation example:

$ gcc -Wall test.c -lpthread && ./a.out 
iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
unexpected length: 0

Details of my machine (the program will also compile and run with this online C compiler):

$ cat /etc/redhat-release
CentOS release 6.5 (Final)

$ uname -a
Linux localhost.localdomain 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

# gcc 4.4.7
$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# glibc 2.12
$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

The program:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <pthread.h>
#include <errno.h>

void dummy_signal_handler(int signal);
void* signal_spam_task(void* arg);
void echo_and_verify_output();
char* fgets_with_retry(char *buffer, int size, FILE *stream);

static pthread_t main_thread;

/**
 * Prints an error message and exits if the output is truncated, which happens
 * about 5% of the time.
 *
 * Installing the signal handler with the SA_RESTART flag, blocking SIGUSR1
 * during the call to fgets(), or sleeping for a few milliseconds after the
 * call to popen() will completely prevent truncation.
 */
int main(int argc, char **argv) {

    // install signal handler for SIGUSR1
    struct sigaction sa, osa;
    sa.sa_handler = dummy_signal_handler;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;
    sigaction(SIGUSR1, &sa, &osa);

    // create a secondary thread to repeatedly send SIGUSR1 to main thread
    main_thread = pthread_self();
    pthread_t spam_thread;
    pthread_create(&spam_thread, NULL, signal_spam_task, NULL);

    // repeatedly execute simple shell command until output is unexpected
    unsigned int i = 0;
    for (;;) {
        printf("iteration %u\n", i++);
        echo_and_verify_output();
    }

    return 0;
}

void dummy_signal_handler(int signal) {}

void* signal_spam_task(void* arg) {
    for (;;)
        pthread_kill(main_thread, SIGUSR1);
    return NULL;
}

void echo_and_verify_output() {

    // run simple command
    FILE* stream = popen("echo -n hello", "r");
    if (!stream)
        exit(1);

    // count the number of characters in the output
    unsigned int length = 0;
    char buffer[BUFSIZ];
       while (fgets_with_retry(buffer, BUFSIZ, stream) != NULL)
        length += strlen(buffer);

    if (ferror(stream) || pclose(stream))
        exit(1);

    // double-check the output
    if (length != strlen("hello")) {
        printf("unexpected length: %i\n", length);
        exit(2);
    }
}

// version of fgets() that retries on EINTR
char* fgets_with_retry(char *buffer, int size, FILE *stream) {
    for (;;) {
        if (fgets(buffer, size, stream))
            return buffer;
        if (feof(stream))
            return NULL;
        if (errno != EINTR)
            exit(1);
        clearerr(stream);
    }
}
Josh Johnson
  • 8,832
  • 4
  • 25
  • 31
  • I expect the `read` system call is being interrupted by the signal. this seems like a bug in `fgets` unless you can find this behavior documented somewhere. – Dark Falcon Sep 15 '14 at 16:41
  • This has to be some sort of bug in either your kernel (unlikely) or your `libc`. With a few corrections I have ran this on OS X and unmodified on RHEL 6 with no problems. – Sergey L. Sep 15 '14 at 17:12
  • Thanks for the info. I'm going to try running this on a few different OS/`glibc` versions and report back. – Josh Johnson Sep 15 '14 at 17:16
  • Same (broken) behavior observed on Fedora 20 with glibc 2.18. – Josh Johnson Sep 15 '14 at 20:39
  • I ran the program under strace and observed that one of the `read` system calls that `fgets` makes is returning the complete "hello" text. That seems to imply that either I am using `fgets` incorrectly or there is a bug in the implementation of `fgets`. – Josh Johnson Sep 15 '14 at 20:46
  • Same deal on Ubuntu 14.04 with eglibc 2.19. – Josh Johnson Sep 15 '14 at 22:03

1 Answers1

2

If an error occurs on a FILE stream while reading with fgets, it's undefined as to whether some bytes read are transferred to the buffer before fgets returns NULL or not (7.19.7.2 of the C99 spec). So if the SIGUSR1 signal occurs while in the fgets call and causes an EINTR, its possible that some characters may be lost from the stream.

The upshot is that you can't use stdio functions to read/write FILE objects if the underlying system calls might have recoverable error returns (such as EINTR or EAGAIN), as there's no guarantee the standard library won't lose some data from the buffer when that happens. You can claim that this is a "bug" in the standard library implementation, but it is a bug that the C standard allows.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • All that C99 §7.19.7.2 says is "On a read error [...] the array contents are indeterminate" on a read error. Contrast that to the documentation for `fread` (§7.19.8.1), which explicitly states "If an error occurs, the resulting value of the file position indicator for the stream is indeterminate." How do you infer anything about the underlying stdio buffer from the `fgets` documentation? – Josh Johnson Sep 16 '14 at 02:46
  • Furthermore, [POSIX](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fgetc.html#tag_16_140_05) says the following about `fgets` returning `EINTR`: "The read operation was terminated due to the receipt of a signal, and **no data was transferred**." – Josh Johnson Sep 16 '14 at 04:22
  • @JoshuaJohnson: Since the spec doesn't say it's ok, its implicitly not ok. The POSIX statement is about fgetc, not fgets, and has the note "The functionality described on this reference page is aligned with the ISO C standard. Any conflict between the requirements described here and the ISO C standard is unintentional. This volume of POSIX.1-2008 defers to the ISO C standard" – Chris Dodd Sep 16 '14 at 15:42
  • Chris, thanks for the back and forth. I'm still not convinced :) That statement in POSIX is on the page for `fgetc`, but it also applies to `fgets`. The fgets page says "Errors -- Refer to `fgetc`.". C99 does not mention `EINTR`. I don't know whether or not it is a conflict for POSIX to place a constraint on an error code that C99 doesn't mention. Anyways, I'll take it up on the glibc mailing list and see what they have to say. – Josh Johnson Sep 16 '14 at 16:20