5

I have a C++ application that includes this function:

int
mySelect(const int fdMaxPlus1,
         fd_set *readFDset,
         fd_set *writeFDset,
         struct timeval *timeout)
{
 retry:
  const int selectReturn
    = ::select(fdMaxPlus1, readFDset, writeFDset, NULL, timeout);

  if (selectReturn < 0 && EINTR == errno) {
    // Interrupted system call, such as for profiling signal, try again.
    goto retry;
  }

  return selectReturn;
}

Normally, this code work just fine, however, in one instance, I saw it get into an infinite loop where select() keeps failing with the EINTR errno code. In this case, the caller had set the timeout to zero seconds and zero microseconds, meaning don't wait and return the select() result immediately. I thought that EINTR only occurs when a signal handler occurred, why would I keep getting a signal handler over and over again (for over 12 hours)? This is Centos 5. Once I put this into the debugger to see what was happening, the code returned without EINTR after a couple iterations. Note that the fd being checked is a socket.

I could add a retry limit to the above code, but I'd like to understand what is going on first.

WilliamKF
  • 41,123
  • 68
  • 193
  • 295
  • 1
    Are you on Linux? Have you tried looking at [`explain_errno_select`](http://linux.die.net/man/3/explain_select)? – Cornstalks Feb 11 '15 at 20:12
  • 2
    Have you done anything fancy regarding signals/signal handlers anywhere in the code ? If you can reproduce it, how does the output look like if you attach the `strace` tool (use strace -p pid_of_your_program) – nos Feb 11 '15 at 20:12
  • You should really look into the strace of your application for spammed signals. Maybe there's a third party library spamming SIGALRM (or something similarly subtle) for some reason you don't know or expect. – moooeeeep Feb 11 '15 at 20:34
  • @nos Thus far it is not reproducible, so I'm going to add a limit on retries. – WilliamKF Feb 11 '15 at 21:18
  • Did you ever figure out why you got this `EINTR`? I am getting the same issue - http://stackoverflow.com/questions/38456803/poll2-on-result-of-pipe2-is-mutating-fd – Noitidart Jul 19 '16 at 11:10
  • 1
    @Noitidart No, I just added code to handle the `EINTR` by counting the retries and after several thousand, inserting a 1000 uSec sleep between tries, and then finally after several thousand more throwing an error to let the higher level code reconnect as an alternate means of recovery. – WilliamKF Jul 19 '16 at 22:08
  • Thanks very much for sharing that input! – Noitidart Jul 19 '16 at 23:42

1 Answers1

1

On Linux, select(2) may modify the timeout argument (passed by address). So you should copy it after the call.

retry:
struct timeout timeoutcopy = timeout;
const int selectReturn
  = ::select(fdMaxPlus1, readFDset, writeFDset, NULL, &timeoutcopy);

(in your code, your timeout is probably zero or very small after a few or even the first iterations)

BTW, I suggest rather using poll(2) instead of select (since poll is is more C10K problem friendly)

BTW, EINTR happens on any signal (see signal(7)), even without a registered signal handler.

You might use strace to understand the overall behavior of your program.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547