8

I'd like to know if it is possible/the recommended way to catch the SIGSEGV signal in multithreaded environment. I am particularly interested in handling the SIGSEGV raised by something like *((int *)0) = 0.

Some reading on this topic led me to signal() and sigaction(), which install a signal handler. While neither seem promising in multithreaded environment. I then tried the sigwaitinfo(), receiving the signals in one thread with a prior pthread_sigmask() call that blocks the signal on the others. It worked to the extent upon which the signal SIGSEGV was raised, using raise(), inside a thread or when it was sent to the process by something like kill -SIGSEGV; however, \*((int*)0) = 0 still kills the process. My test program is as follows

void block_signal()
{
        sigset_t set;

        sigemptyset(&set);
        sigaddset(&set, SIGSEGV);
        sigprocmask(SIG_BLOCK, &set, NULL);

        if (pthread_sigmask(SIG_BLOCK, &set, NULL)) {
                fprintf(stderr, "pthread_sigmask failed\n");
                exit(EXIT_FAILURE);
        }
    }

void *buggy_thread(void *param)
{
        char *ptr = NULL;
        block_signal();

        printf("Thread %lu created\n", pthread_self());

        // Sleep for some random time
        { ... }

        printf("About to raise from %lu\n", pthread_self());

        // Raise a SIGSEGV
        *ptr = 0;

        pthread_exit(NULL);
}

void *dispatcher(void *param)
{
        sigset_t set;
        siginfo_t info;
        int sig;

        sigemptyset(&set);
        sigaddset(&set, SIGSEGV);

        for (;;) {
                sig = sigwaitinfo(&set, &info);
                if (sig == -1)
                        fprintf(stderr, "sigwaitinfo failed\n");
                else
                        printf("Received signal SIGSEGV from %u\n", info.si_pid);
        }
}

int main()
{
        int i;
        pthread_t tid;
        pthread_t disp_tid;

        block_signal();

        if (pthread_create(&disp_tid, NULL, dispatcher, NULL)) {
                fprintf(stderr, "Cannot create dispatcher\n");
                exit(EXIT_FAILURE);
        }

        for (i = 0; i < 10; ++i) {
                if (pthread_create(&tid, NULL, buggy_thread, NULL) {
                        fprintf(stderr, "Cannot create thread\n");
                        exit(EXIT_FAILURE);
                }
        }

        pause();
}

Unexpectedly, the program dies with a segmentation fault instead of printing the raiser's thread id.

pevik
  • 4,523
  • 3
  • 33
  • 44
Eric
  • 341
  • 1
  • 2
  • 7

3 Answers3

10

Your code does not call sigaction(2), and I believe it should call it. Read also signal(7) and signal-safety(7). And the signal action (thru sa_sigaction field should do something (machine specific) with its siginfo_t to skip the offending machine instruction, or to mmap the offending address, or call siglongjmp, otherwise when returning from the signal handler you'll get the SIGSEGV again since the offending machine instruction is restarted.

You cannot handle the SIGSEGV in another thread, since synchronous signals (such as SIGSEGV or SIGSYS) are thread specific (see this answer), so what you try to achieve with sigwaitinfo cannot work. In particular SIGSEGV is directed to the offending thread.

Read also all about Linux signals.

PS. An example of clever SIGSEGV handling is offered by the no-more maintained (in May 2019) Ravenbrook MPS garbage collector library. Notice also the Linux specific and recent userfaultfd(2) and signalfd(2) system calls.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    Signal disposition is not thread-local. Only signal mask, and in some cases signal delivery, is thread-local. – R.. GitHub STOP HELPING ICE Apr 25 '13 at 01:46
  • Thanks for the resourceful references. But when I replaced the offending line '*ptr = 0' with 'raise(SIGSEGV)', then the above code worked. Don't know why '*ptr = 0' doesn't produce the same result. – Eric Apr 25 '13 at 04:01
  • Because segentation violation like `*ptr = 0;` are thread specific, so the `SIGSEGV` is sent to the offending thread. This is not the same with `raise(SIGSEGV)` – Basile Starynkevitch Apr 25 '13 at 05:34
  • 1
    Actually no, raise() can also sends a signal to the calling thread. In a multithreaded program raise(sig) is equivalent to pthread_kill(pthread_self(), sig). Refer to the manpage. – Eric Apr 29 '13 at 02:41
8

Signal delivery for SIGSEGV caused by a faulting memory access is to the thread that performed the invalid access. Per POSIX (XSH 2.4.1):

At the time of generation, a determination shall be made whether the signal has been generated for the process or for a specific thread within the process. Signals which are generated by some action attributable to a particular thread, such as a hardware fault, shall be generated for the thread that caused the signal to be generated. Signals that are generated in association with a process ID or process group ID or an asynchronous event, such as terminal activity, shall be generated for the process.

The problematic aspect of trying to handle SIGSEGV in a multi-threaded program is that, while delivery and signal mask are thread-local, the signal disposition (i.e. what handler to call) is process-global. In other words, sigaction sets a signal handler for the whole process, not just the calling thread. This means that multiple threads each trying to setup their own SIGSEGV handlers will clobber each other's settings.

The best solution I can propose is to set a global signal handler for SIGSEGV using sigaction, preferably with SA_SIGINFO so you get additional information about the fault, then have a thread-local variable for a handler for the specific thread. Then, the actual signal handler can be:

_Thread_local void (*thread_local_sigsegv_handler)(int, siginfo_t *, void *);
static void sigsegv_handler(int sig, siginfo_t *si, void *ctx)
{
    thread_local_sigsegv_handler(sig, si, ctx);
}

Note that this makes use of C11 thread-local storage. If you don't have that available, you can fall back to either "GNU C" __thread thread-local storage, or POSIX thread-specific data (using pthread_key_create and pthread_setspecific/pthread_getspecific). Strictly speaking, the latter are not async-signal-safe, so calling them from the signal handler invokes UB if the illegal access took place inside a non-async-signal-safe function in the standard library. However, if it took place in your own code, you can be sure no non-async-signal-safe function was interrupted by the signal handler, and thus these functions have well-defined behavior (well, modulo the fact that your whole program probably already has UB from whatever it did to generate SIGSEGV...).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • i did not quite understand what is the profit gained from having thread_local_sigsegv_handler. if i understood correctly, this is a global function pointer (allocated specifically for each thread). But how do you get the thread that caused the signal to be raise to call that function? – user1708860 May 24 '14 at 15:18
  • That always happens for synchronously-generated signals. See XSH 2.4.1 *Signal Generation and Delivery*: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_01: *At the time of generation, a determination shall be made whether the signal has been generated for the process or for a specific thread within the process. Signals which are generated by some action attributable to a particular thread, such as a hardware fault, shall be generated for the thread that caused the signal to be generated.* – R.. GitHub STOP HELPING ICE May 24 '14 at 15:26
  • 1
    According to https://www.gnu.org/software/libc/manual/html_node/Thread_002dspecific-Data.html pthread_getspecific is async signal safe (pthread_setspecific is not), so using POSIX thread-specific data storage should be fine (at least with glibc). On the other hand, I could not find any information regardings whether variables declared __thread (or thread_local in C++ 11) are actually async signal safe. Any link to thedocumentation regarding this issue would be appreciated. – Sebastian Marsching Jun 30 '16 at 10:18
2

"Why do you want to catch SIGSEGV ? What will you do after having caught it?"

The most common answer would be: quit/abort. But then, what would be the reason to even deliver this signal to a process instead of just arbitrarily terminating it?

The answer is: because signals, including the SIGSEGV, are just exceptions - and it's very important for some applications to f.e. set the hardware outputs to a "safe mode" or make sure that some important data is left in consistent state before terminating the process.

There are generally 2 kinds of segfaults: caused by write or by read operations.

Segfaults caused by read operations are perfectly safe to catch and even to ignore in some cases(1). Failed write operations need more attention and effort to be safely processed (risk of data/memory corruption), but this is also possible (f.e. by avoiding to dynamically allocate the memory after a segfault).

The problem with "critical signals" (which are delivered to a particular thread, like SIGFPE or SIGSEGV) is that normally the program don't "know" what is the context of the signal - that is, which operation or function have triggered the signal.

There are at least few possible ways to get those informations, for example:

  1. Each thread can perform only a single class of small operations - so if it gets a signal, then it's easy to tell what happened -> terminate the thread, verify the processed data, etc. -> terminate safely.
  2. Use C exceptions - there are few ready to use solutions, mine is: libcxc

(1) F.e. the famous problem with ESRCH and pthread_kill() issued for a thread which have already exited on its own :)

vtomazzi
  • 21
  • 1
  • 1
    "Why do you want to catch SIGSEGV ? What will you do after having caught it?" To postpone the whole process termination before other threads finish their current part of the job (which could be important for the logic of the program). I do not see why if one thread accessed foreign memory all the other must unconditionally die at the same time, especially if it does not affect the state of the whole process except that one thread. – Student4K Nov 06 '19 at 18:18