36

How do you prevent a file descriptor from being copy-inherited across fork() system calls (without closing it, of course)?

I am looking for a way to mark a single file descriptor as NOT to be (copy-)inherited by children at fork(), something like a FD_CLOEXEC-like hack but for forks (so a FD_DONTINHERIT feature if you like). Anybody did this? Or looked into this and has a hint for me to start with?

Thank you

UPDATE:

I could use libc's __register_atfork

 __register_atfork(NULL, NULL, fdcleaner, NULL)

to close the fds in child just before fork() returns. However, the FDs are still being copied so this sounds like a silly hack to me. Question is how to skip the dup()-ing in child of unneeded FDs.

I'm thinking of some scenarios when a fcntl(fd, F_SETFL, F_DONTINHERIT) would be needed:

  • fork() will copy an event FD (e.g. epoll()); sometimes this isn't wanted, for example FreeBSD is marking the kqueue() event FD as being of a KQUEUE_TYPE and these types of FDs won't be copied across forks (the kqueue FDs are skipped explicitly from being copied, if one wants to use it from a child it must fork with shared FD table)

  • fork() will copy 100k unneeded FDs to fork a child for doing some CPU-intensive tasks (suppose the need for a fork() is probabilistically very low and programmer won't want to maintain a pool of children for something that normally wouldn't happen)

Some descriptors we want to be copied (0, 1, 2), some (most of them?) not. I think full FD table duping is here for historic reasons but I am probably wrong.

How silly does this sound:

  • patch fcntl() to support the dontinherit flag on file descriptors (not sure if the flag should be kept per-FD or in a FD table fd_set, like the close-on-exec flags are being kept
  • modify dup_fd() in kernel to skip copying of dontinherit FDs, same as FreeBSD does for kq FDs

consider the program

#include <stdio.h>
#include <unistd.h>
#include <err.h>
#include <stdlib.h>
#include <fcntl.h>
#include <time.h>

static int fds[NUMFDS];
clock_t t1;

static void cleanup(int i)
{
    while(i-- >= 0) close(fds[i]);
}
void clk_start(void)
{
    t1 = clock();
}
void clk_end(void)
{  

    double tix = (double)clock() - t1;
    double sex = tix/CLOCKS_PER_SEC;
    printf("fork_cost(%d fds)=%fticks(%f seconds)\n",
        NUMFDS,tix,sex);
}
int main(int argc, char **argv)
{
    pid_t pid;
    int i;
    __register_atfork(clk_start,clk_end,NULL,NULL);
    for (i = 0; i < NUMFDS; i++) {
        fds[i] = open("/dev/null",O_RDONLY);
        if (fds[i] == -1) {
            cleanup(i);
            errx(EXIT_FAILURE,"open_fds:");
        }
    }
    t1 = clock();
    pid = fork();
    if (pid < 0) {
        errx(EXIT_FAILURE,"fork:");
    }
    if (pid == 0) {
        cleanup(NUMFDS);
        exit(0);
    } else {
        wait(&i);
        cleanup(NUMFDS);
    }
    exit(0);
    return 0;
}

of course, can't consider this a real bench but anyhow:

root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100 fds)=0.000000ticks(0.000000 seconds)

real    0m0.004s
user    0m0.000s
sys     0m0.000s
root@pinkpony:/home/cia/dev/kqueue# gcc -DNUMFDS=100000 -o forkit forkit.c
root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100000 fds)=10000.000000ticks(0.010000 seconds)

real    0m0.287s
user    0m0.010s
sys     0m0.240s
root@pinkpony:/home/cia/dev/kqueue# gcc -DNUMFDS=100 -o forkit forkit.c
root@pinkpony:/home/cia/dev/kqueue# time ./forkit
fork_cost(100 fds)=0.000000ticks(0.000000 seconds)

real    0m0.004s
user    0m0.000s
sys     0m0.000s

forkit ran on a Dell Inspiron 1520 Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz with 4GB RAM; average_load=0.00

0andriy
  • 4,183
  • 1
  • 24
  • 37
user237419
  • 8,829
  • 4
  • 31
  • 38
  • 2
    Why can't you just close it in the child after you call fork? – sjr Apr 19 '11 at 07:55
  • 3
    You could say the same thing about FD_CLOEXEC being useless since you can close the fd before exec()-ing. 3rd-party lib is fork()-ing, I'm not ready to mess with that code and branch it only for my own use – user237419 Apr 19 '11 at 08:18
  • 2
    You have to be a bit creative then. How about you fork() and close before the library forks? – sjr Apr 19 '11 at 18:13
  • +1, I could approve your quicksol as being creative :) but this may become a pain in the arse; plus, the thing i'm really trying to get rid of is the copy process of unneeded fds in child, not getting rid of them after – user237419 Apr 19 '11 at 22:07
  • Why trying to get rid of the copy? Are you trying to optimize for speed? Why not let the kernel do that for you, after all fork is pretty lightweight. – sjr Apr 20 '11 at 16:55
  • 2
    not really lightweight; in fact dup_fd() is pretty heavy. I updated the post with a test. Normally every sane user will do eager fork()s and maintain a pool of children in the scenario of many fds but there is the case when you fork at a later point in time and you have to carry all the load with it – user237419 Apr 20 '11 at 20:57
  • hang on my test is idiotic; preparing a new one :P – user237419 Apr 20 '11 at 21:08
  • 1
    FD_CLOEXEC is not useless even if you control all code in your process. You need it if a file descriptor must not be passed to an executed program but is still necessary after execve() fails. – jilles Apr 21 '11 at 23:37
  • 3
    who said FD_CLOEXEC is useless?! – user237419 Apr 22 '11 at 04:13
  • 1
    @jilles: Ah I see, my first comment. I didn't mean it to sound that way, I was just referring to the (possible) usefulness of a "dontinherit" flag, compared to FD_CLOEXEC in near-the-same usage scenario: not passing arround file descriptors. – user237419 Apr 22 '11 at 06:18
  • 2
    I can provide another reason why this would be a useful feature. In my code I have seen problems where a process closed a TCP port and then tried to open it again immediately but was not allowed because a child process had it open. To fix this I made the child close the FD. However this still left a race condition and so extra code was required to enable the child to signal the parent that it had closed the FD and for the parent to wait for this signal before proceeding. This blocking code in the parent could be spared if we had an FD_DONTINHERIT flag. – Alex Zeffertt Feb 08 '14 at 16:51
  • 2
    @AlexZeffertt Another situation where the same problem appears is in a `make` like system where multiple threads write shell scripts and then fork()+exec()s those. You will get "text file busy" when one fork()ed process happens to inherit a shell script written by another thread. – user239558 Jul 24 '15 at 22:39
  • I yet provide another case where I was missing this feature: I was implementing a UNIX-like pipelining program which creates a new process for every program in the pipe. To connect processes with each other, I used shared fds inherited by fork. Every new process should inherit not all, but only the fd shared with previous and next process. It makes hard to code and control errors when every process inherits every fd and, therefore, every process is responsible of closing the fd that don't need. Otherwise, fds kept open and processes block indefinitely waiting for the input fd to be closed. – Akronix Jan 24 '17 at 19:59
  • 1
    I'm working in a kernel patch to try to add this feature as @sysfault has suggested. – Akronix Jan 24 '17 at 20:01
  • @Akronix: If you do, please support a flag to `open()` and leave out the `fcntl()` entirely. – Ben Voigt Jan 25 '17 at 00:31
  • 1
    @BenVoigt and also patch socket,pipe,etc with a DONT_INHERIT? I'd stick to fcntl(). – user237419 Jan 25 '17 at 10:04
  • @Akronix are you gonna try submitting the patch upstream? Do let us know if you will need help with either initial specs, coding or patch submission. Good luck! – user237419 Jan 25 '17 at 10:04
  • 2
    @sysfault: The `fcntl` approach to `FD_CLOEXEC` is defective (read its own man page). If setting the flag at time of creation requires changing multiple functions, that's unfortunate, but necessary. – Ben Voigt Jan 25 '17 at 14:14
  • I mostly agree with sysfault. Anyway, I'm gonna do a first tentative patch and submit it to the kernel mailing list. Then, the list get the idea and can discuss about technical implementations. Thank you for your offer @sysfault. I let you know how it goes. – Akronix Jan 25 '17 at 18:54

3 Answers3

9

If you fork with the purpose of calling an exec function, you can use fcntl with FD_CLOEXEC to have the file descriptor closed once you exec:

int fd = open(...);
fcntl(fd, F_SETFD, FD_CLOEXEC);

Such a file descriptor will survive a fork but not functions of the exec family.

zneak
  • 134,922
  • 42
  • 253
  • 328
  • 6
    `FD_CLOEXEC` is mentioned in the question, which also explains why it isn't applicable to this scenario (`exec` isn't being called). Besides, `fcntl` isn't the best way to set the close-on-exec flag. – Ben Voigt Jul 03 '13 at 16:36
  • 1
    @BenVoigt what do you mean by `fnctl` isn't the best way to set the close-on-exec flag? – Akronix Jan 25 '17 at 00:24
  • 2
    @Akronix: What happens if another thread calls `exec()` (mutatis mutandi `fork()`) between this thread calling `open` and `fcntl`? Two-phase initialization cannot solve this problem. This is specifically mentioned as a problem in the `fcntl` and `open` man pages, with the solution being the `O_CLOEXEC` flag to `open()`. – Ben Voigt Jan 25 '17 at 00:29
  • Well I guess that in specific multithreaded programs you have to care about those situations as the manual pages say. The issue here is that in my case, for example, I need to let some child inherit a fd and then block that fd to be inherited again. In other words, I'd need to be able to set and unset the FD_DONT_INHERIT flag of a fd during program execution. – Akronix Jan 25 '17 at 00:56
  • @Akronix: If you need it in only one child, don't open it until after `fork()`. – Ben Voigt Jan 25 '17 at 14:15
  • 4
    @Akronix: Or make a `fork` call that takes an extra parameter -- a list of fds to explicitly inherit no matter what the flag says. But don't design something where you can't control whether `fork()` on another thread sees the flag or not. – Ben Voigt Jan 25 '17 at 14:22
7

No. Close them yourself, since you know which ones need to be closed.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
  • 10
    "No." is an answer to the question, but "Close them yourself" is not. There are multiple situations where this creates race conditions as noted in the comments to the original question. – user239558 Jul 24 '15 at 22:41
  • 2
    If you embed access to an Oracle DB to your software, oracle itself makes a fork of your process which inherits your open files and sockets. especially TCP server sockets remain active and prevent closing&reopening them in the main application. there's no place where you can close your own fh, only the described fcntl might help. – Daniel Alder Aug 09 '16 at 13:39
6

There's no standard way of doing this to my knowledge.

If you're looking to implement it properly, probably the best way to do it would be to add a system call to mark the file descriptor as close-on-fork, and to intercept the sys_fork system call (syscall number 2) to act on those flags after calling the original sys_fork.

If you don't want to add a new system call, you might be able to get away with intercepting sys_ioctl (syscall number 54) and just adding a new command to it for marking a file description close-on-fork.

Of course, if you can control what your application is doing, then it might be better to maintain user-level tables of all file descriptors you want closed on fork and call your own myfork instead. This would fork, then go through the user-level table closing those file descriptors so marked.

You wouldn't have to fiddle around in the Linux kernel then, a solution that's probably only necessary if you don't have control over the fork process (say, if a third party library is doing the fork() calls).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • 1
    sounds too complicated to be the right way, I think. I updated the main q with 2 other possible solutions. cheers diablo! – user237419 Apr 19 '11 at 22:04
  • 2
    thinking of a more general solution, adding a new fcntl flag and modifying dup_fd() in kernel (the patch seems trivial to apply) to test against it ... does this sound too intrusive? it's less work than syscall/ioctl way, at least it seems to, at first sight. dup_fd is where the fdcopy happens at fork and it seems this func is bound by usage only with fork() syscall – user237419 Apr 22 '11 at 08:51