8

Consider the following assembly program:

bits 64
global _start
_start:
    mov rax, 0x0000111111111111
    add byte [rax*1+0x0], al
    jmp _start

When you compile this with nasm and ld (on Ubuntu, kernel 5.4.0-48-generic, Ryzen 3900X), you get a segfault:

$ ./segfault-addr
[1]    107116 segmentation fault (core dumped)  ./segfault-addr

When you attach gdb you can see the address that caused this fault:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x111111111111

However, if you set any of the 16 most significant bits to 1 like this:

bits 64
global _start
_start:
    mov rax, 0x0001111111111111
    add byte [rax*1+0x0], al
    jmp _start

You obviously still get a segfault, but now the address is NULL:

(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x0

Why is this happening? Is it caused by gdb, Linux, or the CPU itself?

Is there anything I can do to prevent this behavior?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
cmpxchg8b
  • 661
  • 5
  • 16
  • 3
    The short answer is that x86-64 really only has a 48-bit virtual address space, and addresses outside this range are defined to cause a general protection fault. Unlike a page fault, the CPU does not record the faulting address for a GPF. You'd have to decode the instruction to get it, and the kernel doesn't include code to do that. See https://stackoverflow.com/questions/10360888/identifying-faulting-address-on-general-protection-fault-x86 – Nate Eldredge Oct 11 '20 at 22:01
  • 2
    @Nate: you could write that up as an answer, or we could close this as a duplicate of [x86-64 canonical address?](https://stackoverflow.com/q/25852367) and the GPF question you linked. Maybe also [Retrieving memory data with non-canonical-address causes SIGSEGV rather than SIGBUS](https://stackoverflow.com/q/62621661). [Address canonical form and pointer arithmetic](https://stackoverflow.com/q/38977755) has a diagram of canonical address space. – Peter Cordes Oct 11 '20 at 22:08
  • Also [Why do x86-64 systems have only a 48 bit virtual address space?](https://stackoverflow.com/q/6716946) – Peter Cordes Oct 11 '20 at 22:15

1 Answers1

7

It's the difference between canonical and non-canonical addresses, coming from the fact that the x86-64 doesn't have a full 64-bit virtual address space. Your second example is a non-canonical address as it isn't a sign-extended 48-bit value (you apparently don't have the 5-level page table extension on your machine or it would be 57 bits); such addresses can never resolve to a physical memory location.

Invalid accesses to canonical addresses generate a page fault (#PF), for which the CPU provides the faulting address to the kernel (in the CR2 register), and the kernel passes it along to userspace in the si_addr field of struct siginfo as you see. But accesses to non-canonical addresses are always invalid and the CPU raises a general protection exception (#GP), or in rare cases, a stack fault (#SS). The designers of the x86 architecture chose, in their infinite wisdom, not to provide the faulting address to software in case of a #GP or #SS exception, so the kernel doesn't get it and neither do you.

If you really need the address, your only choice is to decode the instruction that caused the exception, and inspect the contents of registers as needed to work out what it was trying to do.


I presume this decision was because the kernel really needs the address in case of a page fault. An access to a not-present page may be a memory violation that should kill the process; or, for instance, it may simply be a page that has been swapped out from physical memory. In the latter case, the kernel uses the fault address to find the appropriate page on disk and load it back into physical memory. Then it updates the page tables and returns from the exception handler to restart the faulting instruction, and the program can continue.

However, a general protection fault is typically unrecoverable, and the process will have to be killed, or at least signaled so it can try to clean up. In this case there is nothing actionable to be done with the faulting address, and I guess the architecture designers didn't think its potential value for debugging was worth the effort of having the CPU save it. Anyway, many possible causes of #GP don't arise from a memory access at all (e.g. trying to read or write control registers from unprivileged mode), in which case there is no faulting address.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • 1
    Good point with valid vs. invalid page-fault. vs. a GPF never being valid in that sense. Only invalid page faults result in sending a signal to the process; it's easy to forget about valid page faults that get handled silently. – Peter Cordes Oct 12 '20 at 05:24