How to catch segmentation fault in Linux?

Question

I need to catch segmentation fault in third party library cleanup operations. This happens sometimes just before my program exits, and I cannot fix the real reason of this. In Windows programming I could do this with __try - __catch. Is there cross-platform or platform-specific way to do the same? I need this in Linux, gcc.

score 103 · Accepted Answer · edited Dec 05 '21 at 09:54

103

On Linux we can have these as exceptions, too.

Normally, when your program performs a segmentation fault, it is sent a SIGSEGV signal. You can set up your own handler for this signal and mitigate the consequences. Of course you should really be sure that you can recover from the situation. In your case, I think, you should debug your code instead.

Back to the topic. I recently encountered a library (short manual) that transforms such signals to exceptions, so you can write code like this:

try
{
    *(int*) 0 = 0;
}
catch (std::exception& e)
{
    std::cerr << "Exception caught : " << e.what() << std::endl;
}

~~Didn't check it, though.~~ Works on my x86-64 Gentoo box. It has a platform-specific backend (borrowed from gcc's java implementation), so it can work on many platforms. It just supports x86 and x86-64 out of the box, but you can get backends from libjava, which resides in gcc sources.

edited Dec 05 '21 at 09:54

Gulzar

23,452
27
113
201

answered Feb 28 '10 at 08:37

P Shved

96,026
17
121
165

27

+1 for __be sure that you can recover before catching sig segfault__ – Henrik Mühe Feb 02 '14 at 10:29
23

Throwing from a signal handler is a very dangerous thing to do. Most compilers assume that only calls can generate exceptions, and set up unwind information accordingly. Languages that transform hardware exceptions into software exceptions, like Java and C#, are aware that anything can throw; this is not the case with C++. With GCC, you at least need `-fnon-call-exceptions` to ensure that it works–and there is a performance cost to that. There is also a danger that you'll be throwing from a function without exception support (like a C function) and leak/crash later. – zneak Jun 04 '15 at 03:19
1

I agree with zneak. Don't throw from a signal handler. – MM. Aug 12 '15 at 16:31
The library is now in https://github.com/Plaristote/segvcatch, but I couldn't find the manual or compile it. `./build_gcc_linux_release` gives several errors. – alfC Dec 23 '16 at 00:07
Manual link is dead – Gulzar Dec 05 '21 at 09:54
@HenrikMühe what do you mean by "make sure you can"? How can I know? What does "recover" mean? When can't I? – Gulzar Dec 05 '21 at 10:53
@Gulzar, why would you catch an exception if you don't know how to recover from it? If you don't know for sure you can recover from an exception, you simply shouldn't catch it. – wovano Jan 13 '22 at 11:12

score 65 · Answer 2 · edited Dec 07 '16 at 10:09

65

Here's an example of how to do it in C.

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void segfault_sigaction(int signal, siginfo_t *si, void *arg)
{
    printf("Caught segfault at address %p\n", si->si_addr);
    exit(0);
}

int main(void)
{
    int *foo = NULL;
    struct sigaction sa;

    memset(&sa, 0, sizeof(struct sigaction));
    sigemptyset(&sa.sa_mask);
    sa.sa_sigaction = segfault_sigaction;
    sa.sa_flags   = SA_SIGINFO;

    sigaction(SIGSEGV, &sa, NULL);

    /* Cause a seg fault */
    *foo = 1;

    return 0;
}

edited Dec 07 '16 at 10:09

Surajeet Bharati

1,363
1
18
36

answered Mar 12 '10 at 22:40

JayM

4,798
1
21
15

1

Can I get a stack trace when its signaled? – daisy Dec 09 '13 at 13:20
9

Doing IO in a signal handler is a recipe for disaster. – Tim Seguine Nov 06 '16 at 20:33
How could we combine it with setjmp/longjmp to simulate a C++-style catch block? I tried longjmp'ing from the handler and it didn't work. – ogurets May 12 '17 at 17:15
10

@TimSeguine: that's is not true. You just need to make sure you know what you are doing. `signal(7)` lists all async-signal-safe functions that can be used with relatively little care. In the example above it is also completely safe because nothing else in the program is touching `stdout` but the `printf` call in the handler. – stefanct Nov 17 '17 at 11:28
6

@stefanct This is a toy example. Virtually any non-toy program is going to hold the lock on stdout at some point. With this signal handler, the worst that can probably happen is a deadlock on segfault, but that can be bad enough if you currently have no mechanism to kill rogue processes in your use case. – Tim Seguine Nov 17 '17 at 11:50
2

You were talking about I/O in general; without limitation; without giving any rationale at all. There is nothing wrong or dangerous with creating a new file (stream) and outputting information there, neither would it be to write to `STDOUT_FILENO` directly. Thus your comment is misleading and that's what I criticized. – stefanct Nov 24 '17 at 10:50
4

@stefanct You are ignoring context. I didn't say anything about general I/O. But since you bring it up: read and write have synchronization issues. Their use in asynchronous code is non trivial and starting from the basis of a buggy, toy example that basically says "Look how easy this is", is indeed a recipe for disaster. I don't see how you expect someone to magically go from cargo-cult signal handling code to being a domain expert and taking every little thing into account. I wanted to get across the message "DON'T COPY THIS EXAMPLE". If that didn't come across, then that is my only regret. – Tim Seguine Nov 26 '17 at 14:52
9

according to [2.4.3 Signal Actions](http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03), calling printf from within a signal handler which is called as a result of an illegal indirection, whether the program is multithreaded or not is just plain _undefined behavior_ period. – Julien Villemure-Fréchette Nov 22 '18 at 22:42
2

LOL, the "disaster" has already happened. You may as well try to tell the user what happened. Also, 99% of seg faults is an attempt to dereference a NULL pointer, so printing out the values of the relevant pointers (if your handler is localized) is going to work most of the time to identify the specific variable that is causing the problem. Printing this out is key because once software is released you don't have access to a user's machine, so you have to rely on what they tell you. If they tell you they got a message that pointer XYZ was null, that's the key info. – Tyler Durden Jun 19 '21 at 21:55
1

@TylerDurden If I could downvote that comment I would. seg faults from heap corruption are common, and they tend to happen with the heap locked. And guess what? `printf()` tends to use the heap. So a shallow, toy-like attempt to "tell the user what happened" winds up telling the user absolutely nothing and deadlocks the process to boot. Ooof. – Andrew Henle Sep 29 '21 at 13:04
@AndrewHenle First of all, I printf out of exceptions all the time. Secondly your idea that printf uses the heap is totally wrong and you would know that if actually read the source code to a typical printf function.. – Tyler Durden Sep 29 '21 at 15:29
5

@TylerDurden *First of all, I printf out of exceptions all the time.* So you have low standards for the code you write, and you publicly attach your name to that fact. I'm not sure why you're proud of that, but OK. *Secondly your idea that printf uses the heap is totally wrong and you would know that if actually read the source code to a typical printf function.* [ORLY?!?!](https://github.com/lattera/glibc/blob/895ef79e04a953cac1493863bcae29ad85657ee1/stdio-common/vfprintf.c#L1025) You just might want to take your own advice about reading source code before posting something. – Andrew Henle Sep 29 '21 at 16:23

score 20 · Answer 3 · edited Dec 05 '21 at 10:11

For portability, one should probably use std::signal from the standard C++ library, but there is a lot of restriction on what a signal handler can do. Unfortunately, it is not possible to catch a SIGSEGV from within a C++ program without introducing undefined behavior because the specification says:

it is undefined behavior to call any library function from within the handler other than a very narrow subset of the standard library functions (abort, exit, some atomic functions, reinstall current signal handler, memcpy, memmove, type traits, std::move, std::forward, and some more).
it is undefined behavior if handler use a throw expression.
it is undefined behavior if the handler returns when handling SIGFPE, SIGILL, SIGSEGV

This proves that it is impossible to catch SIGSEGV from within a program using strictly standard and portable C++. SIGSEGV is still caught by the operating system and is normally reported to the parent process when a wait family function is called.

You will probably run into the same kind of trouble using POSIX signal because there is a clause that says in 2.4.3 Signal Actions:

The behavior of a process is undefined after it returns normally from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), sigqueue(), or raise().

A word about the longjumps. Assuming we are using POSIX signals, using longjump to simulate stack unwinding won't help:

Although longjmp() is an async-signal-safe function, if it is invoked from a signal handler which interrupted a non-async-signal-safe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main()), the behavior of any subsequent call to a non-async-signal-safe function or equivalent is undefined.

This means that the continuation invoked by the call to longjump cannot reliably call usually useful library function such as printf, malloc or exit or return from main without inducing undefined behavior. As such, the continuation can only do a restricted operations and may only exit through some abnormal termination mechanism.

To put things short, catching a SIGSEGV and resuming execution of the program in a portable is probably infeasible without introducing undefined behavior. Even if you are working on a Windows platform for which you have access to Structured exception handling, it is worth mentioning that MSDN suggest to never attempt to handle hardware exceptions: Hardware Exceptions.

At last but not least, whether any SIGSEGV would be raised when dereferencing a null valued pointer (or invalid valued pointer) is not a requirement from the standard. Because indirection through a null valued pointer or any invalid valued pointer is an undefined behaviour, which means the compiler assumes your code will never attempt such a thing at runtime, the compiler is free to make code transformation that would elide such undefined behavior. For example, from cppreference,

int foo(int* p) {
    int x = *p;
    if(!p)
        return x; // Either undefined behavior above or this branch is never taken
    else
        return 0;
}
 
int main() {
    int* p = nullptr;
    std::cout << foo(p);
}

Here the true path of the if could be completely elided by the compiler as an optimization; only the else part could be kept. Said otherwise, the compiler infers foo() will never receive a null valued pointer at runtime since it would lead to an undefined behaviour. Invoking it with a null valued pointer, you may observe the value 0 printed to standard output and no crash, you may observe a crash with SIGSEG, in fact you could observe anything since no sensible requirements are imposed on programs that are not free of undefined behaviors.

SIGSEGV is hardly a hardware exception, though. One could always use a parent-child architecture where the parent is able to detect the case of a child that got killed by the kernel and use IPC to share relevant program state in order to resume where we left of. I believe modern browsers can be seen this way, as they use IPC mechanisms to communicate with that one process per browser tab. Obviously the security boundary between processes is a bonus in the browser scenario. — 0xC0000022L, Nov 06 '20 at 11:26

score 8 · Answer 4 · edited Apr 13 '14 at 23:08

8

C++ solution found here (http://www.cplusplus.com/forum/unices/16430/)

#include <signal.h>
#include <stdio.h>
#include <unistd.h>
void ouch(int sig)
{
    printf("OUCH! - I got signal %d\n", sig);
}
int main()
{
    struct sigaction act;
    act.sa_handler = ouch;
    sigemptyset(&act.sa_mask);
    act.sa_flags = 0;
    sigaction(SIGINT, &act, 0);
    while(1) {
        printf("Hello World!\n");
        sleep(1);
    }
}

edited Apr 13 '14 at 23:08

chue x

18,573
7
56
70

answered Dec 23 '12 at 13:17

revo

540
1
5
15

9

I know this is just an example that you didn't write, but doing IO in a signal handler is a recipe for disaster. – Tim Seguine Nov 06 '16 at 20:32
3

@TimSeguine: repeating stuff that is at best very misleading is not a good idea (cf. https://stackoverflow.com/questions/2350489/how-to-catch-segmentation-fault-in-linux#comment81651055_2436368) – stefanct Nov 17 '17 at 11:29
5

@stefanct The precautions necessary in order to use printf safely in a signal handler are not trivial. There is nothing misleading about that. This is a toy example. And even in this toy example it is possible to deadlock if you time the SIGINT right. Deadlocks are dangerous precisely BECAUSE they are rare. If you think this advice was misleading, then stay away from my code, because I don't trust you within a mile of it. – Tim Seguine Nov 17 '17 at 11:54
Again, you were talking about I/O in general here. Instead of pointing out the problem with this actual example, which IS a bad one indeed. – stefanct Nov 24 '17 at 10:52
2

@stefanct If you want to nitpick and ignore the context of the statement, then that is your problem. Who said I was talking about I/O in general? You. I just have a major problem with people posting toy answers to difficult problems. Even in the case you use async safe functions, there is still a lot to think about and this answer makes it seem like it is trivial. – Tim Seguine Nov 26 '17 at 14:43
For discussion only. Say I am trying to write a reentrant function to process data, maybe using third party code that gave rare random SIGSEGV errors (never use such lib in the first place, but in a hurry to get job done). Since this is a low frequency event, I can use a strategy to try a few times and hoping it will pass this section (usually this works). In such case (not a durable solution), I want to handle segmentation fault and restart my function from the where it failed (say vector element 9999). This is probably not possible because you need read memory? – Kemin Zhou Nov 22 '18 at 18:19
@TimSeguine As a non professional C++ developer, I have no idea what to take of this. I have some production code I have to keep running. If for any reason I get a segmentation fault, what should I do? I do have to catch, it, but don't understand when I can or can't. Care to explain? – Gulzar Dec 05 '21 at 10:27
1

@Gulzar There are a lot of different reasons for segmentation faults to occur, so it is not possible to give a general answer. But the most common reason to encounter one is that there is a bug in your code that causes unsafe memory access. In those cases trying to catch them and recover is not usually possible or advisable because stack or heap corruption has already occurred and the only safe thing to do is die. The simplest advice would be to try to find the bug by running it with valgrind or by recompiling with "address sanitizer" to try to find the bug. Both give hints about where to look – Tim Seguine Dec 05 '21 at 16:26
@KeminZhou in the case you describe, you have no way to really be sure that you don't have memory corruption, so I wouldn't advise that strategy. But on the other hand pretty much anything you decide to do in a SIGSEGV handler is probably going to be undefined behavior anyway. If you are okay with that, then you can probably setjmp and longjmp to inteact with error handling code around the iteration in question. That is the only way I can see that might accomplish what you are asking for, but maybe I am missing something obvious because I haven't spent a great deal of time thinking about this. – Tim Seguine Dec 05 '21 at 17:31
@Tim Seguine Agree, for segfault there is really no standard procedure to follow. As a C++ programmer I encounter this once in a few weeks. The solution is always knowing the code well, run under debugger, guess where it could be the issue, read the section of suspicious code carefully to find fault. May segfaults are easy to resolve, those fault at location caused by bad code somewhere else is a little harder to track. – Kemin Zhou Dec 05 '21 at 19:21

score 7 · Answer 5 · edited May 23 '17 at 12:02

Sometimes we want to catch a SIGSEGV to find out if a pointer is valid, that is, if it references a valid memory address. (Or even check if some arbitrary value may be a pointer.)

One option is to check it with isValidPtr() (worked on Android):

int isValidPtr(const void*p, int len) {
    if (!p) {
    return 0;
    }
    int ret = 1;
    int nullfd = open("/dev/random", O_WRONLY);
    if (write(nullfd, p, len) < 0) {
    ret = 0;
    /* Not OK */
    }
    close(nullfd);
    return ret;
}
int isValidOrNullPtr(const void*p, int len) {
    return !p||isValidPtr(p, len);
}

Another option is to read the memory protection attributes, which is a bit more tricky (worked on Android):

re_mprot.c:

#include <errno.h>
#include <malloc.h>
//#define PAGE_SIZE 4096
#include "dlog.h"
#include "stdlib.h"
#include "re_mprot.h"

struct buffer {
    int pos;
    int size;
    char* mem;
};

char* _buf_reset(struct buffer*b) {
    b->mem[b->pos] = 0;
    b->pos = 0;
    return b->mem;
}

struct buffer* _new_buffer(int length) {
    struct buffer* res = malloc(sizeof(struct buffer)+length+4);
    res->pos = 0;
    res->size = length;
    res->mem = (void*)(res+1);
    return res;
}

int _buf_putchar(struct buffer*b, int c) {
    b->mem[b->pos++] = c;
    return b->pos >= b->size;
}

void show_mappings(void)
{
    DLOG("-----------------------------------------------\n");
    int a;
    FILE *f = fopen("/proc/self/maps", "r");
    struct buffer* b = _new_buffer(1024);
    while ((a = fgetc(f)) >= 0) {
    if (_buf_putchar(b,a) || a == '\n') {
        DLOG("/proc/self/maps: %s",_buf_reset(b));
    }
    }
    if (b->pos) {
    DLOG("/proc/self/maps: %s",_buf_reset(b));
    }
    free(b);
    fclose(f);
    DLOG("-----------------------------------------------\n");
}

unsigned int read_mprotection(void* addr) {
    int a;
    unsigned int res = MPROT_0;
    FILE *f = fopen("/proc/self/maps", "r");
    struct buffer* b = _new_buffer(1024);
    while ((a = fgetc(f)) >= 0) {
    if (_buf_putchar(b,a) || a == '\n') {
        char*end0 = (void*)0;
        unsigned long addr0 = strtoul(b->mem, &end0, 0x10);
        char*end1 = (void*)0;
        unsigned long addr1 = strtoul(end0+1, &end1, 0x10);
        if ((void*)addr0 < addr && addr < (void*)addr1) {
            res |= (end1+1)[0] == 'r' ? MPROT_R : 0;
            res |= (end1+1)[1] == 'w' ? MPROT_W : 0;
            res |= (end1+1)[2] == 'x' ? MPROT_X : 0;
            res |= (end1+1)[3] == 'p' ? MPROT_P
                 : (end1+1)[3] == 's' ? MPROT_S : 0;
            break;
        }
        _buf_reset(b);
    }
    }
    free(b);
    fclose(f);
    return res;
}

int has_mprotection(void* addr, unsigned int prot, unsigned int prot_mask) {
    unsigned prot1 = read_mprotection(addr);
    return (prot1 & prot_mask) == prot;
}

char* _mprot_tostring_(char*buf, unsigned int prot) {
    buf[0] = prot & MPROT_R ? 'r' : '-';
    buf[1] = prot & MPROT_W ? 'w' : '-';
    buf[2] = prot & MPROT_X ? 'x' : '-';
    buf[3] = prot & MPROT_S ? 's' : prot & MPROT_P ? 'p' :  '-';
    buf[4] = 0;
    return buf;
}

re_mprot.h:

#include <alloca.h>
#include "re_bits.h"
#include <sys/mman.h>

void show_mappings(void);

enum {
    MPROT_0 = 0, // not found at all
    MPROT_R = PROT_READ,                                 // readable
    MPROT_W = PROT_WRITE,                                // writable
    MPROT_X = PROT_EXEC,                                 // executable
    MPROT_S = FIRST_UNUSED_BIT(MPROT_R|MPROT_W|MPROT_X), // shared
    MPROT_P = MPROT_S<<1,                                // private
};

// returns a non-zero value if the address is mapped (because either MPROT_P or MPROT_S will be set for valid addresses)
unsigned int read_mprotection(void* addr);

// check memory protection against the mask
// returns true if all bits corresponding to non-zero bits in the mask
// are the same in prot and read_mprotection(addr)
int has_mprotection(void* addr, unsigned int prot, unsigned int prot_mask);

// convert the protection mask into a string. Uses alloca(), no need to free() the memory!
#define mprot_tostring(x) ( _mprot_tostring_( (char*)alloca(8) , (x) ) )
char* _mprot_tostring_(char*buf, unsigned int prot);

PS DLOG() is printf() to the Android log. FIRST_UNUSED_BIT() is defined here.

PPS It may not be a good idea to call alloca() in a loop -- the memory may be not freed until the function returns.

How to catch segmentation fault in Linux?

5 Answers5

Linked

Related