20

I have a process using sockets, database connections and the likes. It is basically a server process relaying between sensor data and a web interface, and as such it is important to ensure the application, if killed, terminates gracefully.

How do I handle unexpected exceptions such as segfaults (at least for debugging) as well as kill signals so that I can close any connections and stop any threads running so that the process does not leave a mess of anything it is using?

rgettman
  • 176,041
  • 30
  • 275
  • 357
user623879
  • 4,066
  • 9
  • 38
  • 53
  • Keep in mind continuing to run, even to clean up, may be dangerous after a segmentation fault. – icktoofay Sep 11 '11 at 04:05
  • 1
    Also keep in mind that you cannot catch kill signals. – Gabe Sep 11 '11 at 04:09
  • Not sure what *mess* you want to avoid. Killing threads, closing files and freeing memory is typically done by the OS very efficiently, so in most cases a simple exit would do the trick. Is there something specific you worry about? – Soren Sep 11 '11 at 04:10
  • 3
    @gabe - it's better to say you can't catch `SIGKILL` as to not confuse people who have never used anything other than the command `kill` from the shell ;) – Brian Roach Sep 11 '11 at 04:12
  • @Soren - catching `SIGTERM` and `SIGINT` is a very common practice to allow you to sanely exit. – Brian Roach Sep 11 '11 at 04:14
  • 4
    @Soren: I don't know the OP's issues, but making sure that files are always in a consistent state, database transactions are rolled back, etc. often require cleanup that the OS can't handle just by exiting your process. – Gabe Sep 11 '11 at 04:17
  • possible duplicate of [SIGKILL signal Handler](http://stackoverflow.com/questions/3908694/sigkill-signal-handler) – Ben Voigt Sep 11 '11 at 04:25

3 Answers3

14

Catching signals is hard. You have to be careful. Your first step is to use sigaction to install a signal handler for the desired signals.

  • Choose a set of signals to respond to and choose what they mean for your process. For example, SIGTERM quits, SIGHUP restarts, SIGUSR1 reloads configuration, etc.

  • Don't try to respond to all signals and don't try to "clean up" after signal that indicates an error in your program. SIGKILL can't be caught. SIGSEGV, SIGBUS, and others like them shouldn't be caught unless you have a VERY good reason. If you want to debug, then raise the ulimit for core dumps — attaching a debugger to a core image is far more effective than anything you or I could ever code up. (If you do try to clean up after a SIGSEGV or something like that, realize that the cleanup code might cause an additional SIGSEGV and things could get bad quickly. Just avoid the whole mess and let SIGSEGV terminate your program.)

  • How you handle the signals is tricky. If your application has a main loop (e.g., select or poll) then the signal handler can just set a flag or write a byte to a special pipe to signal the main loop to quit. You can also use siglongjmp to jump out of a signal handler, but this is VERY difficult to get right and usually not what you want.

It's hard to recommend something without knowing how your application is structured and what it does.

Also remember that the signal handler itself should do almost nothing. Many functions are not safe to call from signal handlers.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
10

I sometimes like to get a backtrace on SIGSEGV, the catching part goes like:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>

void sig_handler(int);

int main() {
    signal(SIGSEGV, sig_handler);
    int *p = NULL;
    return *p;
}

void sig_handler(int sig) {
    switch (sig) {
    case SIGSEGV:
        fprintf(stderr, "give out a backtrace or something...\n");
        abort();
    default:
        fprintf(stderr, "wasn't expecting that!\n");
        abort();
    }
}

You do want to be very careful handling these things, e.g. make sure you can't trigger another signal.

daniel
  • 1,416
  • 1
  • 10
  • 14
  • 2
    Isn't it easier to use `ulimit` to get core dumps and get the stack trace that way? – Dietrich Epp Sep 11 '11 at 04:26
  • not familiar with that approach, can you elaborate? I only use this stuff for debugging btw, i.e. I make an error and want to see where the signal came from straight away. – daniel Sep 11 '11 at 04:30
  • 3
    You really shouldn't be using `signal`. For years it has been recommended not to do so and to use `sigaction` instead. From `man signal`: *The behavior of signal() varies across Unix versions, and has also varied historically across different versions of Linux. Avoid its use: use sigaction(2) instead.*. I'm not downvoting ... but it's close. You shouldn't recommend anyone to use it. – Brian Roach Sep 11 '11 at 04:31
  • 1
    The `ulimit` command allows you to make it so programs dump core on `SIGSEGV`. A core file is a copy of the memory image of your program, and you can attach a debugger to it and poke around in memory (get stack traces, examine variables), or send the core file to the developer if it's not your program. – Dietrich Epp Sep 11 '11 at 04:33
9

You install signal handlers to catch signals -- however in 99% of cases you just want to exit and let the Linux OS take care of the cleanup -- it will happily close all files, sockets, free memory and shutdown threads.

So unless there is anything specifically you want to do, like sending a message on the sockets, then you should just exit from the process and not try to catch the signal.

Soren
  • 14,402
  • 4
  • 41
  • 67
  • I think you are right here...application reliability is more important in my case and doing crazy stuff with signals could be bad. When a segfault happens, if I get coredumps, does it tell you where the segfault occured? – user623879 Sep 11 '11 at 05:52
  • The calling stack tells you were the seg-fault occured, assuming the calling stack didn't get corrupted -- this question here : http://stackoverflow.com/questions/105659/how-can-one-grab-a-stack-trace-in-c talks about how to get the stack trace – Soren Sep 11 '11 at 16:37
  • Great advice. An exception to this rule is when you require to clean or change the hardware when a program shutdowns (in embeeded systems). – Havok Aug 07 '14 at 00:56
  • Old question but I still wonder what did you mean with the 99%? – Moritz Schmidt May 16 '18 at 01:37
  • What I mean is that unless you are a very seasoned programmer that knows exactly the reason for why the particular application should catch the signal, then you are likely better off not doing it at all. – Soren May 17 '18 at 09:34
  • There are many memory errors with terminating a process when we execute large program using a profile tool or memory check tool to catch uncleaned memory. with this approach how can we correct memory errors ? – Dig The Code Jan 21 '19 at 10:51
  • @DigTheCode -- this sounds like a new question. Memory checking tools are there to show you memory leaks, and clearly you have not cleared the memory you are allocating, so that is what the tool is flagging up. Most tool reports, leaked memory separate from not lost but not freed either memory -- so in many cases you just have to interpret and understand your reports correctly. – Soren Jan 30 '19 at 23:03