0

I have a multi-threaded application and have got a way to do a telnet, ssh on to this application. In my application, I do one of the init script restart using the custom system() call below. It seems like, the child process is still active. I am saying this because If I logout from telnet session still the process hangs i.e. it cannot logout. This happens only when I restart the script using this system call. Is there something wrong with my system() function?

int system(const char *command)
{
   int wait_val, pid;
   struct sigaction sa, save_quit, save_int;
   sigset_t save_mask;

   syslog(LOG_ERR,"SJ.. calling this system function\r\n");

   if (command == 0)
       return 1;

   memset(&sa, 0, sizeof(sa));
   sa.sa_handler = SIG_IGN;
   /* __sigemptyset(&sa.sa_mask); - done by memset() */
   /* sa.sa_flags = 0; - done by memset() */

   sigaction(SIGQUIT, &sa, &save_quit);
   sigaction(SIGINT, &sa, &save_int);
   __sigaddset(&sa.sa_mask, SIGCHLD);
   sigprocmask(SIG_BLOCK, &sa.sa_mask, &save_mask);

  if ((pid = vfork()) < 0) {
        perror("vfork fails: ");
       wait_val = -1;
       goto out;
   }
   if (pid == 0) {
       sigaction(SIGQUIT, &save_quit, NULL);
       sigaction(SIGINT, &save_int, NULL);
       sigprocmask(SIG_SETMASK, &save_mask, NULL);

       struct sched_param param;
       param.sched_priority = 0;

       sched_setscheduler(0, SCHED_OTHER, &param);
       setpriority(PRIO_PROCESS, 0, 5);

       execl("/bin/sh", "sh", "-c", command, (char *) 0);
       _exit(127);
   }

#if 0
   __printf("Waiting for child %d\n", pid);
#endif

   if (wait4(pid, &wait_val, 0, 0) == -1)
       wait_val = -1;

out:
   sigaction(SIGQUIT, &save_quit, NULL);
   sigaction(SIGINT, &save_int, NULL);
   sigprocmask(SIG_SETMASK, &save_mask, NULL);
   return wait_val;
}

Any ideas on how to debug whether this system call is getting hanged or not?

dexterous
  • 6,422
  • 12
  • 51
  • 99
  • Why can't you use the *standard* `system` provided in your C library? And why do you use `vfork` (nearly obsolete today; you should use `fork`) ? – Basile Starynkevitch Feb 11 '18 at 07:52
  • BTW, if running some `init` script (which is often wrong, since `init` could be *systemd*) I would use `daemon` – Basile Starynkevitch Feb 11 '18 at 07:55
  • the init script restarts the node – dexterous Feb 11 '18 at 08:46
  • Basile: See my answer and please feel free to elaborate the answer. This way anyone looking at it might get benefit. This is a general debug problem most of the embedded linux project faces where there are plenty of threads under one process. – dexterous Apr 18 '18 at 05:19

1 Answers1

0

I realized this happens because file descriptors are inherited upon fork . Since my custom system() is nothing but fork() and exec(). There are plenty of sockets in my application. These socket file descriptors gets inherited by the child process.

My assumption here is that "Child process can't exit because it is waiting for parent process to close the file descriptors or those file descriptors are in a state where it can be closed". Not sure what those states are though.

So, here is the interesting link I found -

Call system() inside forked (child) process, when parent process has many threads, sockets and IPC

Solution -

linux fork: prevent file descriptors inheritance

Not sure, I can do this in a big application where sockets are opened at thousand of places. So, here is what I did.

My Solution -

I created a separate process/daemon that listens for the command from the parent application. This communication is based on socket. Since, it is a separate application/daemon it doesn't affect the main application which is running multiple threads and has a lot of opened sockets. This worked for me.

I believe that this problem will be fixed once I do -

fcntl(fd, F_SETFD, FD_CLOEXEC);

Any comments are welcome here.

Is this a fundamental problem in Linux, C i.e. 
all file descriptors are inherited by default? 
Why linux/kernel allow this? What advantage do we get out of it?
dexterous
  • 6,422
  • 12
  • 51
  • 99