x86 rep instructions, lock prefix, atomics and real-time

Question

Consider the following case:

Thread A (evil code I do not control):

# Repeat some string operation for an arbitrarily long time
lock rep stosq ...

Thread 2 (my code, should have lock-free behaviour):

# some simple atomic operation involving a `lock` prefix
# on the beginning of the data processed by the repeating instruction
lock cmpxchg ...

Can I assume that the lock held by rep stosq will be for each individual element, and not for the instruction's execution as a whole ? Otherwise doesn't that mean that every code which should have real-time semantics (no loops, no syscalls, total functions, every operation terminates in a finite time, etc) can still be broken just by having some "evil" code in another thread doing such a thing, which would block the cmpxchg on the other thread for an abritrarily long time?

The threat model I'm worried about is a denial-of-service attack against other "users" (including kernel IRQ handlers) on a real-time OS, where the "service" guarantees include very low latency interrupt handling.

If not lock rep stos, is there anything else I should worry about?

You cannot lock `rep stosq` or even `stosq` for that matter. — fuz, Sep 21 '22 at 23:24
Not only can you not use `lock` with `rep` or `stosq`, you should be getting a `#UD` exception for trying. — sj95126, Sep 21 '22 at 23:49
Only `wbinvd` can block everything for a really long time. (It writes back and evicts cache from all cores). But it's privileged (probably for this reason). See [Interrupting instruction in the middle of execution](https://stackoverflow.com/q/53687178) re: interrupt latency and interruptible instructions. The only way to do a huge many-cache-line atomic store is with transactional memory extensions, (RTM or HLE parts of TSX), and it aborts the transaction instead of blocking things like interrupts or requests from other cores. — Peter Cordes, Sep 22 '22 at 02:04
I searched but didn't find any duplicate Q&As about attempts to use `lock` on instructions it doesn't apply to. Except for *[Atomic unaligned read](https://stackoverflow.com/q/18918941)* about `lock mov`, answered only in comments. Other than that, the fact that it will `#UD` is mostly mentioned on SO in comments, or as an aside in a larger answer. It is of course in Intel's manual: https://www.felixcloutier.com/x86/lock — Peter Cordes, Sep 22 '22 at 02:39
@PeterCordes: You could create a reference question specifically about why misuse of lock is causing #UD, etc., but I'm always doubting if that's the right approach or if it's just injecting more noise. — sj95126, Sep 22 '22 at 04:12
@sj95126: IDK if that would help many future readers. It's right there in the docs for anyone that realizes that it's due to having a `lock` prefix. In this case, it's a hypothetical where the OP made a wrong assumption about how `lock` would work, without actually trying it. A canonical would have maybe worked for this and one other question over the years of SO. Possibly could formulate one where we could usefully say something about how other prefixes tend to be ignored by CPUs, so they can be later documented that way if a new encoding like `pause` = `rep nop` uses a mandatory prefix. — Peter Cordes, Sep 22 '22 at 05:37
If another thread of your process is running "evil code", the game is already over and you have lost. Slowing down other threads is the *very least* of the "evil" things that another thread has the power to do. It has read/write access to *all of your memory*. It can steal your secrets, hack your data, redirect you to execute arbitrary code, etc, etc. — Nate Eldredge, Sep 24 '22 at 17:16
@Nate, I think he means another hardware thread in the same CPU running a different process, not another thread of the same process. — prl, Sep 25 '22 at 02:42
@prl: But if they are different processes, then they wouldn't normally be able to access the same memory read/write (specifically, not the same cache lines). — Nate Eldredge, Sep 25 '22 at 03:24
@Nate, yes, I think OP is imagining that LOCK causes a bus lock, rather than a cache line lock, which is yet another reason his concern is unfounded. — prl, Sep 25 '22 at 03:49
@NateEldredge: The threat model is a denial-of-service attack against other "users" (including kernel IRQ handlers) on a real-time OS, where the "service" guarantee included very low latency interrupt handling. `wbinvd` would disrupt that, which is why it's privileged. Filling the store buffer with retired cache-miss stores isn't great either on modern CPUs with large store buffers, especially if you create contention for them to even commit by using other cores to spam writes and atomic RMWs on them. — Peter Cordes, Sep 27 '22 at 04:17
IDK if there's a limit on how many SB entries they allow to "graduate" such that they can't be discarded when an interrupt arrives. But anyway, clearly Intel knows that ultra-low-latency real-time stuff isn't a big selling point for x86; it's fast *enough* due to high clock speeds unless you try to create the worst case like that. — Peter Cordes, Sep 27 '22 at 04:20
<< The threat model is a denial-of-service attack against other "users" >> thanks, this is exactly my concern and I'm trying to think of all the attack vectors possible — Jean-Michaël Celerier, Sep 27 '22 at 14:57

Brendan · Answer 1 · 2022-09-28T03:21:52.427

The lock rep stosq (and lock rep movsd, lock rep cmpsd, ...) aren't legal instructions.

If they were legal; they'd be more like rep (lock stosq), locking for a single stosq.

If not lock rep stos, is there anything else I should worry about?

You might worry about very old CPUs. Specifically, the original Pentium CPUs had a flaw called the "F00F bug" (see https://en.wikipedia.org/wiki/Pentium_F00F_bug ) and old Cryix CPUs has a flaw called the "Coma bug" (see https://en.wikipedia.org/wiki/Cyrix_coma_bug ). For both of these (if the OS doesn't provide a viable work-around), unprivileged software is able to trick the CPU into "lock forever".

x86 rep instructions, lock prefix, atomics and real-time

1 Answers1