4

Hardware provides atomic instructions like test-and-set, compare-and-swap, load-linked-store-conditional. Are these privileged instructions? That is, can only the OS execute them (and thus requiring a system call)?

I thought they are not privileged and can be called in user space. But http://faculty.salina.k-state.edu/tim/ossg/IPC_sync/ts.html seems to suggest otherwise. But then, futex(7), under certain conditions, can achieve locking without a system call, which means it must execute the instruction (like test-and-set) without privilege.

Contradiction? If so, which is right?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
flow2k
  • 3,999
  • 40
  • 55
  • I think you need to state the CPU architecture. As far as I know Intel does not have a `Test_and_Set (TS)` instruction. Intel CPU's have the atomics, they just lack that particular instruction. Maybe it is a field in a machine specific register/control register and reading/writing it is protected? Ping @PeterCordes. – jww Sep 02 '18 at 01:03
  • Right, I believe x86 has the compare-swap `cmpxchg` instruction. How can I ping him? – flow2k Sep 02 '18 at 01:16
  • 1
    You can get my attention by posting questions in the `[assembly]` tag :P – Peter Cordes Sep 02 '18 at 01:24
  • 1
    @jww: x86 has had `lock bts` since 386. [`bts` = Bit test and set](http://felixcloutier.com/x86/BTS.html). – Peter Cordes Sep 02 '18 at 01:32
  • Thanks @Peter. The author specifically states the instruction is `ts`. Are you aware of an arch that comes from? Or are they using a hypothetical instruction set? – jww Sep 02 '18 at 01:38
  • @jww: I don't think they're talking about a specific architecture; they're talking about the primitive operation (https://en.wikipedia.org/wiki/Test-and-set), regardless of how it's implemented in asm. In C++11, it's the only one that's required to be lock-free: [C++: How can lock-free data structures be implemented in C++ if std::atomic\_flag is the only lock-free atomic type?](https://stackoverflow.com/q/50825174). (I don't know of a `ts` instruction on any ISA. Most are LL/SC, and don't have any single-instruction atomic RMWs.) It's often called TAS, just like cmpxchg is normally CAS. – Peter Cordes Sep 02 '18 at 01:50
  • I wondered if they were talking about a hypothetical ISA, but the few higher up / earlier pages give no sign of that. It's pure C. But sure, you could have a hypothetical crappy ISA with a privileged TS instruction (and no other atomic primitives) which makes it impossible to write good lockless multi-threaded user-space code, and hard to multi-thread efficiently. – Peter Cordes Sep 02 '18 at 01:51

1 Answers1

10

That page is wrong. It seems to be claiming that lock-free atomic operations are privileged on ISAs in general, but that is not the case. I've never heard of one where atomic test-and-set or any other lock-free operation required kernel mode.

If that was the case, it would require C++11 lock-free atomic read-modify-write operations to compile to system calls, but they don't on x86, ARM, AArch64, MIPS, PowerPC, or any other normal CPU. (try it on https://godbolt.org/).

It would also make "light-weight" locking (which tries to take the lock without a system call) impossible. (http://preshing.com/20111124/always-use-a-lightweight-mutex/)

Normal ISAs allow user-space to do atomic RMW operations, on memory shared between threads or even between separate processes. I'm not aware of a mechanism for disabling atomic RMW for user-space on x86. Even if there is such a thing on any ISA, it's not the normal mode of operation.

Read-only or write-only accesses are usually naturally atomic on aligned locations on all ISAs, up to a certain width (Why is integer assignment on a naturally aligned variable atomic on x86?), but atomic RMW does need hardware support.


On x86, TAS is lock bts, which is unprivileged. (Documentation for the lock prefix). x86 has a wide selection of other atomic operations, like lock add [mem], reg/immediate, lock cmpxchg [mem], reg, and even lock xadd [mem], reg which implements fetch_add when the return value is needed. (Can num++ be atomic for 'int num'?)

Most RISCs have LL/SC, including ARM, MIPS, and PowerPC, as well as all the older no longer common RISC ISAs.


futex(2) is a system call. If you call it, everything it does is in kernel mode.

It's the fallback mechanism used by light-weight locking in case there is contention, which provides OS-assisted sleep/wake. So it's not futex itself that does anything in user-space, but rather lock implementations built around futex can avoid making system calls in the uncontended or low-contention case.

(e.g. spin in user-space a few times in case the lock becomes available.)

This is what the futex(7) man page is describing. But I find it a bit weird to call it a "futex operation" if you don't actually make the system call. I guess it's operating on memory that kernel code might look at on behalf of other threads that are waiting, so the necessary semantics for modifying memory locations in user-space code depends on futex.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Thanks @Peter for this detailed answer! If `futex(7)` does not call `futex(2)` (i.e. no contention), it must use the TAS or other atomic instruction to implement the lock, right? I haven't checked the code for `futex(7)`... I mentioned it in the question since `futex(7)` is a library function, it runs in user space, and so any TAS executed must be in user space. – flow2k Sep 02 '18 at 04:40
  • 2
    @flow2k: Yes, atomic RMW in userspace. But `futex(7)` is documenting the general concept of how to build a lock or other operation around `futex(2)`. There is no library function called `futex()` other than the system call wrapper; if there was it would be called `futex(3)`. Anyway yes, the whole point of this answer is that `futex(2)` is how you sleep if you find the lock already taken in an implementation like this x86 asm spinlock (which works in userspace): [Locks around memory manipulation via inline assembly](https://stackoverflow.com/a/37246263) – Peter Cordes Sep 02 '18 at 05:12
  • 1
    I have never seen a processor where bit test and set operations are privileged instructions. No wonder CS students are so confused. – user3344003 Sep 02 '18 at 15:01