87

What is the correct interpretation of the following segfault messages?

segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]
segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in libQtWebKit.so.4.5.2[7fa44d2f8000+f6f000]
segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in libQtWebKit.so.4.5.2[7f2aff9f7000+f6f000]
segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in libQtWebKit.so.4.5.2[7f24b197a000+f6f000]
Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
knorv
  • 49,059
  • 74
  • 210
  • 294

4 Answers4

111

This is a segfault due to following a null pointer trying to find code to run (that is, during an instruction fetch).

If this were a program, not a shared library

Run addr2line -e yourSegfaultingProgram 00007f9bebcca90d (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

Since it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

  • address (after the at) - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)

  • ip - instruction pointer, ie. where the code which is trying to do this lives

  • sp - stack pointer

  • error - An error code for page faults; see below for what this means on x86 (link).

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found       1: protection fault
     *   bit 1 ==    0: read access         1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==                           1: use of reserved bit detected
     *   bit 4 ==                           1: fault was an instruction fetch
     *   bit 5 ==                           1: protection keys block access
     *   bit 15 ==                          1: SGX MMU page-fault
     */
    
SergA
  • 1,097
  • 13
  • 21
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 6
    According to http://stackoverflow.com/a/2179464/1100614 , `error` is not the value of `errno`, but an architecture-specific error code for page faults. – Martin von Wittich Feb 04 '15 at 08:45
  • Yes I could figure out as well what ip and sp means but what is the meaning of the number after the at??? – Nils Jul 28 '15 at 16:07
  • 1
    @Nils, that's the address it's trying to page in. Since it's so low here, presumably it's an offset being applied to a NUL pointer. – Charles Duffy Jul 28 '15 at 16:21
  • 7
    "You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact." - This is not correct, the base address is actually printed in the error message itself (`7f9beb83a000` in this case). And even if it weren't, since the base address is page-aligned it's usually possible to make a reasonable educated guess. – Benno Mar 16 '18 at 11:31
  • [This link](https://utcc.utoronto.ca/~cks/space/blog/linux/KernelSegfaultMessageMeaning) has more about theses messages. And [this link](https://utcc.utoronto.ca/~cks/space/blog/linux/ShuttingUpSegfaultSyslogs) has instructions how to enable/disable these messages – iuridiniz Sep 17 '19 at 12:16
78

Error 4 means "The cause was a user-mode read resulting in no page being found.". There's a tool that decodes it here.

Here's the definition from the kernel. Keep in mind that 4 means that bit 2 is set and no other bits are set. If you convert it to binary that becomes clear.

/*
 * Page fault error code bits
 *      bit 0 == 0 means no page found, 1 means protection fault
 *      bit 1 == 0 means read, 1 means write
 *      bit 2 == 0 means kernel, 1 means user-mode
 *      bit 3 == 1 means use of reserved bit detected
 *      bit 4 == 1 means fault was an instruction fetch
 */
#define PF_PROT         (1<<0)
#define PF_WRITE        (1<<1)
#define PF_USER         (1<<2)
#define PF_RSVD         (1<<3)
#define PF_INSTR        (1<<4)

Now then, "ip 00007f9bebcca90d" means the instruction pointer was at 0x00007f9bebcca90d when the segfault happened.

"libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]" tells you:

  • The object the crash was in: "libQtWebKit.so.4.5.2"
  • The base address of that object "7f9beb83a000"
  • How big that object is: "f6f000"

If you take the base address and subtract it from the ip, you get the offset into that object:

0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D

Then you can run addr2line on it:

addr2line -e /usr/lib64/qt45/lib/libQtWebKit.so.4.5.2 -fCi 0x49090D
??
??:0

In my case it wasn't successful, either the copy I installed isn't identical to yours, or it's stripped.

Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
Tim
  • 4,999
  • 3
  • 24
  • 29
  • 13
    This answer shows that the comment "You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact" in the accepted answer is wrong. The segfault message itself tells you the base address of the shared library at the time the segfault occurred. – dmr195 Apr 25 '16 at 13:26
  • 4
    I just used the `-fCi` option on my own faulty code and slammed that bug within a minute. Thanks, great help. – hschou Feb 15 '17 at 16:11
  • Sorry can you clarify.. in this case for example? segfault at 7ffe44462000 ip 00007f4ee2211f0e sp 00007ffe44460168 error 6 in libc-2.26.so[7f4ee2093000+1d6000] ### should it be: -fCi 0x17EF0E – Zibri Mar 24 '19 at 14:30
  • This answer is solid gold, found my problem instantly! – Max Fellows Mar 24 '22 at 18:25
11

Let's go to the source -- 2.6.32, for example. The message is printed by show_signal_msg() function in arch/x86/mm/fault.c if the show_unhandled_signals sysctl is set.

"error" is not an errno nor a signal number, it's a "page fault error code" -- see definition of enum x86_pf_error_code.

"[7fa44d2f8000+f6f000]" is starting address and size of virtual memory area where offending object was mapped at the time of crash. Value of "ip" should fit in this region. With this info in hand, it should be easy to find offending code in gdb.

sendmoreinfo
  • 582
  • 6
  • 22
  • "Value of ip should fit in this region." That is unclear to me. Does it mean: if ip doesn't fit in this region, the program was wrong. Or: always expect the kernel to specify an address that is within this region (so if it outside there is something fishy going in within the kernel) – Albert van der Horst Dec 05 '15 at 12:09
  • It's the latter. Perhaps the better wording is "expect the valut of ip to fit in this region". – sendmoreinfo Dec 06 '15 at 08:48
0

You can fix it with the following steps :

  • dmesg

Ex : [4970814.649014] upowerd[46459]: segfault at 8 ip 000055ce91269328 sp 00007fff71b98480 error 4 in upowerd[55ce91248000+39000] [4970840.152464] upowerd[46512]: segfault at 8 ip 000055c18f8e5328 sp 00007fffa63df280 error 4 in upowerd[55c18f8c4000+39000]

  • Locate the library, here you have upowerd

  • Re-install it, remove and install upowerd

  • dmesg

Ex : normally, you will have it deleted and mentioned at the last line

[4970942.517131] upowerd[47466]: segfault at 8 ip 00005637fd95b328 sp 00007ffeb77c3460 error 4 in upowerd (deleted)[5637fd93a000+39000]

Best regards,

Moustapha Kourouma