5

While reading and learning from open source OSes I stumbled across an extremely complicated way of calling a "method" in assembly. It uses the 'ret' instruction to call a library method doing this:

push rbp                ; rsp[1] = rbp
mov rbp, .continue      ; save return label to rbp
xchg rbp, QWORD [rsp]   ; restore rbp and set rsp[1] to return label

push rbp ; rsp[0] = rbp
mov rbp, 0x0000700000000000 + LIB_PTR_TABLE.funcOffset ; rbp = pointer to func pointer
mov rbp, QWORD [rbp]    ; rbp = func pointer
xchg rbp, QWORD [rsp]   ; restore rbp and set rsp[0] to func pointer

; "call" library by "returning" to the address we just planted
ret

.continue:

I added the comments in order to understand it myself and it seems I am right or close enough because all experiments I did succeeded. But then i tried doing this, which also works perfectly:

mov rax, 0x0000700000000000 + LIB_PTR_TABLE.funcOffset  ; rax = ptr to func ptr
mov rax, QWORD [rax]    ; rax = func ptr
call rax ; actually call the library function in a normal fashion

Looking at the amount of instructions and what the CPU actually has to do in both cases one would assume, if one was faster it would be the "call" variant. But since the "ret" variant was used and coming up with this requires a bunch of knowledge in the first place, what advantages does the first variant have? (Or does it?)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
RAVN Mateus
  • 560
  • 3
  • 13
  • 3
    What comes to mind is that this is that this doesn't trample any registers. It means that you can call some function without having to disturb any of the registers. – Thomas Jager Oct 20 '21 at 17:01
  • True, but pushing rax and then popping rax wil do the trick aswell. If the library changes the value in ebx and doesn't clean it up then (i believe) ebx will me trampled no matter which variant is used. – RAVN Mateus Oct 20 '21 at 17:07
  • 2
    @RAVNMateus: The pop of rax would need to happen in the target routine, then - probably not convenient - especially since the return address now sits on top of it. – 500 - Internal Server Error Oct 20 '21 at 17:09
  • 3
    The downside of this is that it defeats the CPU's ability to optimize `call` and `ret` when they come in matched pairs, and so it is likely to be relatively slow. Sometimes this is done deliberately when this "optimization" results in information leaks due to speculative execution, as in a [retpoline](https://stackoverflow.com/questions/48089426/what-is-a-retpoline-and-how-does-it-work). It's not clear if that might be part of the goal here. – Nate Eldredge Oct 20 '21 at 17:12
  • Unfortunately i also cannot tell if this is what is tried to be achieved here. I'll look it up and clarify if i know for sure. – RAVN Mateus Oct 20 '21 at 17:41
  • 1
    `xchg` with memory has an implicit `lock` prefix, so this is very slow even on top of the retpoline effect. :/ If you just want a retpoline, there are much more efficient ways to write one. Can you link where you found this code? – Peter Cordes Oct 20 '21 at 22:21
  • The XCHG register,[memory] instruction is dangerous. This instruction always has an implicit LOCK prefix which forces synchronization with other processors or cores. This instruction is therefore very time consuming, and should always be avoided unless the lock is intended. The XCHG instruction with register operands may be useful when optimizing for size. – vengy Oct 21 '21 at 02:00
  • 1
    On GitHub in an hobby OS called Cyjon under [kernel/macro/library.asm](https://github.com/Blackend/Cyjon/blob/master/kernel/macro/library.asm). – RAVN Mateus Oct 24 '21 at 20:25

1 Answers1

5

As CPUs get faster the chance of a CPU stalling (and being unable to do anything) due to things like cache misses and branch mispredictions increase. To help avoid these stalls most modern 80x86 CPUs have a bunch of logic to help predict the target address of control flow changes; including branch direction predictors, branch target predictors, return stack buffers, etc.

The problem is that a malicious attacker (using speculative execution and measuring timing) can extract confidential information from all the information that the CPU collects to improve performance; including extracting confidential information from branch direction predictors, branch target predictors, return stack buffers, etc.

When this was discovered, people (mostly kernel developers) scrambled to think of various ways to mitigate the security problem. Specifically, looking for ways to avoid, spoil or pollute the data the CPU collects.

More specifically (for the code you've shown); if the code used call rax, then it'd add data to the CPU's return stack buffer that a malicious attacker could probe to determine something about the original value in rax (and if rax is supposed to be confidential, then this constitutes a confidentiality leak).

One alternative is to push a return address and then use an indirect jump. In this case it would just leave (confidential) data in the CPU's branch target buffer that could be probed by an attacker, which doesn't really help.

Using ret instead prevents the security problem by not storing anything on the return stack buffer (or in the branch target buffer). As a side-effect, it will also "de-sync" the CPU's return stack buffer; obfuscating previous calls/future returns a little.

Sadly; all of this causes a performance problem - it brings us back to "as CPUs get faster the chance of a CPU stalling increases" and adds the cost of fetching code from the wrong address on top of the cost of the stall.

Brendan
  • 35,656
  • 2
  • 39
  • 66
  • Really good answer, this is an interesting topic i never heard about. But i can't help but find it funny that I found that code in an OS not even capable of writing to a storage device. – RAVN Mateus Oct 20 '21 at 19:38
  • What info is this supposed to be hiding from the callee? Even without microarchitectural side-channels, they can see the return address architecturally. Also, you're already handing control to the call target, so if you don't trust it then you're screwed. (Unless it's a JITed and your JIT engine wouldn't let sandboxed code JIT into something that reads the ret addr). – Peter Cordes Oct 21 '21 at 02:08
  • Normally the concern is that some unprivileged code may have primed branch prediction to speculatively go to the wrong place (Spectre), for which a normal retpoline is sufficient without as much damage to branch prediction. Not that the correct target code can extract info about its caller from branch prediction. Do you have any links about that being something to defend against? It's very non-obvious how a return-predictor entry could even be read, even by executing a push/ret that went to some known location. – Peter Cordes Oct 21 '21 at 02:10