0

TLDR: In Solaris, if O_NDELAY is set on stdin by a child process, bash exits. Why?

The following code causes interactive bash (v4.3.33) or tcsh (6.19.00) shells to exit after the process finishes running:

#include <fcntl.h>

int main() {
  fcntl( 0, F_SETFL, O_NDELAY );

//int x = fcntl( 0, F_GETFL );
//fcntl( 0, F_SETFL, ~(x ^ (~O_NDELAY)) );

  return 0;
}

The versions of ksh, csh and zsh we have aren't affected by this problem.

To investigate I ran bash & csh under truss (similar to strace on Linux) like this:

$ truss -eaf -o bash.txt -u'*' -{v,r,w}all bash --noprofile --norc
$ truss -eaf -o csh.txt -u'*' -{v,r,w}all csh -f

After csh finishes running the process it does the following:

fcntl( 0, F_GETFL ) = FWRITE|FNDELAY
fcntl( 0, F_SETFL, FWRITE) = 0

... which gave me an idea. I changed the program to the commented out code above so it would toggle the state of O_NDELAY. If I run it twice in a row bash doesn't exit.

Brian Vandenberg
  • 4,011
  • 2
  • 37
  • 53
  • 1
    Well, I would certainly not be surprised if the shell gets very confused by having the terminal unexpectedly put in non-blocking mode. Why are you doing that? This seems like a "don't do that, then". – Nate Eldredge Sep 04 '20 at 03:22
  • 2
    In particular, it means that when the shell next tries to read the terminal, instead of blocking to wait for input, the `read()` call will fail immediately. There's no particular reason the shell should be designed to handle this situation, and so it probably assumes there was a genuine error and gives up. The shell on Linux may be written in a different way such that this happens not to break it (e.g. waiting for input using `poll()` instead of `read()`). – Nate Eldredge Sep 04 '20 at 03:26
  • Which shell are you using? – Andrew Henle Sep 04 '20 at 14:19
  • Well, *I'm* not the one doing it; it's a 3rd party tool. The shell is bash v 4. – Brian Vandenberg Sep 04 '20 at 16:30

1 Answers1

0

This answer got me started on the right path. The man page for read (in Solaris) says:

When attempting to read a file associated with a terminal that has no data currently available:

* If O_NDELAY is set, read() returns 0
* If O_NONBLOCK is set, read() returns -1 and sets errno to EAGAIN

... so when bash tries to read stdin it returns 0 causing it to assume EOF was hit.

This page indicates O_NDELAY shouldn't be used anymore, instead recommending O_NONBLOCK. I've found similar statements regarding O_NDELAY / FIONBIO for various flavors of UNIX.

As an aside, in Linux O_NDELAY == FNDELAY == O_NONBLOCK, so it's not terribly surprising I was unable to reproduce this problem in that environment.

Unfortunately, the tool that's doing this isn't one I have the source code for, though from my experimenting I've found ways to work around the problem.

If nothing else I can make a simple program that removes O_NDELAY as above then wrap execution of this tool in a shell script that always runs the "fixer" program after the other one.

Brian Vandenberg
  • 4,011
  • 2
  • 37
  • 53