2

I'm outlining a program.

Basically, I want to use nftw to traverse the directory tree and perform a task(myexecutable).

this function nftw takes for example (fn) as an argument and uses it as a callback function.

Now I am planning to use a classic fork-exec inside this callback function that I have defined to gain as much speed as possible by distributing the task over multiple instances of the same process for different files to make that happen almost at simultaneously.

lets say I want to do action P on every file. so I define my callback function like:

static int fn(const char* pathname, const struct stat* st, int tf, struct FTW* ff){
    if(tf == FTW_F){
        if(fork() == 0){
            execl("myexecutable", "myexecutable", pathname, NULL);
            exit(0);
        }else{
            return 0;
        }
    }
}

also different instances of myexecutable do not in anyway depend on each other or share any resources. and they don't need to communicate with the parent process and vice versa.

can this cause a problem with the calling function?

should I expect nftw to go crazy over this or show undefined behavior?

hirad davari
  • 103
  • 6
  • 2
    execl will replace the running process, so the fact that you are in a callback doesn't matter. – stark Aug 22 '22 at 13:11
  • 1
    If you incorporate the logic in `myexecutable` into the program starting `myexecutable` you could use threads instead. It is often faster to start a thread than doing `fork()` + `exec*()`. – Ted Lyngmo Aug 22 '22 at 13:23
  • 2
    It may be safer to call either `_exit` or `_Exit` instead of `exit` if the `execl` call returns in the child process. – Ian Abbott Aug 22 '22 at 13:25
  • 1
    You should print an error message and exit with a non-zero status if `execl()` fails. You should probably print an error message when `fork()` fails, rather than just reporting success — as you might well run out of processes on a big file system. You are right not to bother checking the return value from `execl()` — if it succeeds it doesn't return; if it returns, it failed. You probably need to beware of the number of zombie processes this can create. – Jonathan Leffler Aug 22 '22 at 14:12
  • @JonathanLeffler Thank you. I am aware of that problem. The solution I came up with is that the dispatching function will keep track of the child processes and will put a reasonable limit on the number of them. It will perform a wait(&st) before calling a new fork. – hirad davari Aug 23 '22 at 07:50
  • Rather than `wait()`, consider a loop using `waitpid()` with the `WNOHANG` option. That will collect multiple children (one per iteration) without blocking. – Jonathan Leffler Aug 23 '22 at 10:56

1 Answers1

7

This won't be a problem because you're calling execl immediately after the new process is forked. All process memory, including any state nftw might have, is replaced with that of the new program.

The only change you should make is to call _exit instead of exit in case execl fails. That way it won't call any atexit handlers and potentially cause issues with open file descriptors in the parent process.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • what if i implement "myexecutable" as a function and call it but make sure it doesnt return? like put an exit at the end. can that cause a problem? – hirad davari Aug 23 '22 at 07:51