What is the difference between the ret instruction in x86 and x64?

Question

I was recently trying out a stack overflow exercise on x64. When performing this on x86, I would expect the following for a junk overwrite address (e.g. 'AAAA'):

The data I provide overflows the buffer, and overwrites the return address
Upon ret, the (overwritten) return address will be (effectively) popped into the EIP register
It is realised that the address is not valid, and a segmentation fault is raised

In x64, this seems different (beyond the interchange of EIP with RIP in the above steps). When providing a junk address of 'AAAAAAA', the processor seems to do some validity checking before popping the address. By observation, it seems required that the two most significant bytes of the address are null, before it is loaded. Otherwise, a segfault occurs. I believe this is due to the use of 48-bit addressing in x64, however I was under the impression that addresses starting with 0xFFFF were also valid, yet this also produces a segfault.

Is this an accurate description of the difference? Why is this check performed before the data is loaded into the RIP register, whilst the other validity check is performed afterwards? Are there any other differences between these instructions?

EDIT: To clarify my observations, I note that when a 8-byte return address is provided, the RIP still points to the address of the ret instruction, and the RSP still points to the overwritten return address on segfault. When an 6-byte return address is provided, the overwritten address has been popped into the RIP when the segfault is observed.

The check is done early because the `rip` register might not even have the bits in hardware so it can't be loaded. Yes, the top part of the address range should also be canonical, but you must use sign extension. So for example `0xffff800000000000` is canonical and will be loaded into `rip` and only fault afterwards :) `0xffff4141414141414141` is not canonical. — Jester, Jun 13 '20 at 19:02
Addresses starting with `0xffff` are only valid in kernel mode afaik. — fuz, Jun 13 '20 at 19:02
@fuz: They're not automatically invalid (non-canonical), they're only protected by the normal page-table mechanism. (e.g. kernels set the U/S bit in PTEs so that most / all of those virtual pages are supervisor-only). Linux for example maps the `ffffffffff600000-ffffffffff601000` range into user-space processes as the `[vsyscall]` page. (`cat /proc/self/maps`). — Peter Cordes, Jun 13 '20 at 19:04
@Vortix: What exactly are you claiming happens? That RSP isn't updated or something? How are you distinguishing between code-fetch from an invalid page leading to a segfault vs. attempting to load a non-canonical address into RIP? The Operation section of the manual for [`ret`](https://www.felixcloutier.com/x86/ret) is complicated by clutter from `retf` (far), but IA-32E-MODE-RETURN-TO-SAME-PRIVILEGE-LEVEL: includes `IF the return instruction pointer is not within canonical address space THEN #GP(0); FI;`. Instead of `#PF`, but GP or invalid pagefault both get the kernel to deliver SIGSEGV — Peter Cordes, Jun 13 '20 at 21:44
Thanks for the information, everyone! @Jester indeed, it loads as you said! — VortixDev, Jun 13 '20 at 22:22
@PeterCordes thanks for the reference, very interesting! I observe that the RSP points to the overwritten return address, and the RIP points to the address of the `ret` instruction when the segfault occurs, whereas when a 6-byte address is provided, the overwritten address has been popped into the RIP when the segfault is observed. — VortixDev, Jun 13 '20 at 22:22
You should [edit] your question to say that more clearly. Interesting that RSP doesn't get updated before the fault. So it's not code-fetch from a non-canonical address that faults, it's the `ret` instruction's attempt to set RIP to a non-canonical address. That makes the whole RET instruction fault, meaning that none of its effects are visible. — Peter Cordes, Jun 13 '20 at 22:27

Peter Cordes · Accepted Answer · 2022-01-18T17:41:39.390

Interesting that RSP doesn't get updated before the fault. So it's not code-fetch from a non-canonical address that faults, it's the ret instruction's attempt to set RIP to a non-canonical address.

That makes the whole RET instruction fault, meaning that none of its effects are visible. (Because Intel's manual doesn't define any partial-progress / stuff updated even on fault behaviour for ret.)

Unfortunately the Operation section for ret in Intel's manual is a rats nest of conditionals because they use one block to document near and far, and every combination of mode and operand-size. Plain ret in 64-bit mode is "IA-32e mode", operand-size=64, and "near" (not changing CS to a different code segment, just changing RIP).

In that case, x86-64 normal ret is basically pop rip.
32-bit mode normal ret is basically pop eip.
Nothing more, nothing less. RIP = *RSP++.

What is the difference between the ret instruction in x86 and x64?

1 Answers1

Linked