0

I'm writing a toy shell program for a class and I did all of my coding on my mac (Darwin 13.4.0) and compiled using gcc <programname> -o <executablename>. Everything seems to run perfectly. Then I ftp the source code over to the school's Linux server and compile again, using the exact same compilation instruction, but on the Linux machine the code is buggy. In particular, the sigaction (signal handler) doesn't seem to be working properly all the time. It seems as though it isn't reliably catching the SIGCHLD signal. Edit--Actually, what was happening was the variable I was storing the status in was getting clobbered, so the incorrect status was displayed for foreground processes.

Anyone have any ideas why the change in OS might cause this kind of problem?

Here's what my signal handler code looks like:

void handleSignal(int signal){
  int childid = 0;
  int tempStatus = 0;

  while ( (childid = waitpid(-1, &childStatus, WNOHANG)) > 0) {

 /*Parse the exit status */
 if(WIFEXITED(childStatus)){
  childStatus = WEXITSTATUS(childStatus);
 }
 switch (signal) {

 /*if the signal came from a child */
 case SIGCHLD:

  /*for background processes alert user */
  if (childid != foregroundProcess){

    printf("pid %i terminated:",childid);
    showStatus(childStatus);
    fflush(stdout);
  }

  /* for foreground children ending, just set the temp Status, in case*/
  /* background children also need to be caught */
  else {
    tempStatus = childStatus;
  }

  break;

 case SIGINT:

  /*If there is a foreground child, send signal to it, else ignore. */
  if (foregroundProcess){
    kill(foregroundProcess, signal);
  }
  break;

 default:
  printf("Some other signal was received: code %i\n", signal);
  fflush(stdout);
 }
}
  childStatus = tempStatus;  /* reset child status to foreground status */
}

Edit: Adding the code that registers the signal handler:

struct sigaction sa;
sa.sa_handler = &handleSignal; /*passing function ref. to handler */ 
sa.sa_flags = SA_RESTART;  /* restart the shell signal handler  */
sigfillset(&sa.sa_mask); /*block all other signals while handling sigs */

sigaction(SIGINT, &sa, NULL);
sigaction(SIGCHLD, &sa, NULL);
sigaction(SIGTERM, &sa, NULL);
Fish314
  • 31
  • 1
  • 7
  • Do you get any warnings if you compile as per this SO question? http://stackoverflow.com/questions/154630/recommended-gcc-warning-options-for-c – Peter M Nov 23 '15 at 19:50
  • 1
    Please clarify what you mean by "seems as though it isn't reliably catching the SIGCHLD signal"? What exact behaviour are you seeing? It doesn't call the handler? It calls the handler with an unexpected signal? Not getting the expected child status? Or something else? – kaylum Nov 23 '15 at 19:54
  • It's probably not the root cause of your problem but you should only call async safe functions in a signal handler and `printf` in particular is not async safe. – kaylum Nov 23 '15 at 20:04
  • Originally I compiled in both places with -pedantic -Wall and it came out clean. Just tried the expanded warning list, and I get different warnings. On the darwin system (code is working) I get just these 4: 3 unused parameters and an implicit conversion from int to long int changes signedness. On the Linux box I get those plus a bunch of warnings about implicit declaration of sigfillset and sigaction and SA_RESTART. All of those things come from the signal.h header. Is this telling me that the version of Linux I'm using doesn't support those? Or perhaps it's not finding the header? – Fish314 Nov 23 '15 at 20:18
  • would be nice to see the code that registers the signal and expect it to be happening – OznOg Nov 23 '15 at 20:44
  • @kaylum, actually it looks like it is catching the SIGCHLD. I moved the waitpid call outside the while loop just to see what it was catching. On both systems, a SIGCHLD signal is being sent (20 on DARWIN, 17 on Linux... but those both comply with POSIX). On Darwin, the waitpid returns a legitimate pid of the child process. On Linux it doesn't (returns -1). That's the crux of the problem right there. Therefore, what I think was happening was that the while loop was not executing at all, and the exit status wasn't be updated or displayed. – Fish314 Nov 23 '15 at 20:49
  • @OznOg: Ask and ye shall receive... or something. But anyways, there it is. – Fish314 Nov 23 '15 at 20:59
  • Well, some progress -- I'm showing errno 10, i.e., no child processes. Which is weird because waitpid(-1, stuff , stuff) should return 0 if there are no children to wait on. And it's even weirder because I HAVE child processes. – Fish314 Nov 23 '15 at 21:23
  • @Fish314 Just to clarify. You have the `waitpid` in a loop and it's called with `WNOHANG`. So of course it is expected to return -1 eventually. So are you saying it is not giving you a successful return for any iteration of the loop? Really, you should only call `waitpid` after checking the signal is `SIGCHLD` and not some other signal. And please update your question proper with the behaviour of your program as described in some of your comments. The more precise you make your description the more likely someone can help. – kaylum Nov 23 '15 at 21:38
  • 1
    @kaylum. Yes, I wasn't getting a successful wait on the first iteration. Figured out the issue below, but thanks for the help! And you're right. I have no idea why I put the while loop outside the case SIGCHLD instead of inside it. I find I get stupider as the clock moves past midnight for some reason ;). – Fish314 Nov 23 '15 at 22:23

1 Answers1

0

Disregard. Solved it. Here's what the problem was: I was waiting on the foreground child process in the main program loop, and I was also waiting on it in the signal handler. In the main loop I was using waitoptions = 0, so the program would wait, and in the handler WNOHANG.

So why did it behave differently in Linux versus Darwin? Here's my theory: On the Darwin box, when the child died, the SIGCHLD signal was being delivered and handled before the waitpid in the parent's main loop caught that the child had died. So the signal handler handled the dying foreground child. Then, when execution returned to the waitpid command in the main loop, there was no child with that pid, and the waitpid function returned with an error of -1. However, I wasn't checking the return value of that waitpid call, which meant that the error was silent to me, and the program operated correctly on the Darwin machine.

In the Linux machine, however, the waitpid in the main program loop executed first before the signal handler. So childstatus would be set (correctly) in the main program loop. Then, when it came time for the signal handler to catch the SIGCHLD signal, the process had already been waited on. So waitpid returned error (-1, errno 10). But because of how I handled the child status variable in the signal handler, (with childStatus and tempStatus) this situation would clobber my child status, resetting it back to 0. So every foreground child showed an exit status of 0 when run on the Linux machine, but an appropriate exit status on the Mac. The solution? Change the tempStatus declaration to int tempStatus = childStatus. That way, if the childStatus has already be set by a foreground process, the entire loop is skipped and the status persists. If the signal handler is called on behalf of a background process, however, the signal handler saves the foreground status, if it exists, and displays the background status as it handles the background, and then resets the foreground status.

Ugly... but it will work for well enough for a grade in a couple of hours.

I don't know if any of this will ever help anyone else, but hey, it might. Talk about frustrating!

Fish314
  • 31
  • 1
  • 7