Spinlock with XCHG unlocking

Question

The example implementation Wikipedia provides for a spinlock with the x86 XCHG command is:

; Intel syntax

locked:                      ; The lock variable. 1 = locked, 0 = unlocked.
     dd      0

spin_lock:
     mov     eax, 1          ; Set the EAX register to 1.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.
                             ; This will always store 1 to the lock, leaving
                             ;  the previous value in the EAX register.

     test    eax, eax        ; Test EAX with itself. Among other things, this will
                             ;  set the processor's Zero Flag if EAX is 0.
                             ; If EAX is 0, then the lock was unlocked and
                             ;  we just locked it.
                             ; Otherwise, EAX is 1 and we didn't acquire the lock.

     jnz     spin_lock       ; Jump back to the MOV instruction if the Zero Flag is
                             ;  not set; the lock was previously locked, and so
                             ; we need to spin until it becomes unlocked.

     ret                     ; The lock has been acquired, return to the calling
                             ;  function.

spin_unlock:
     mov     eax, 0          ; Set the EAX register to 0.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.

     ret                     ; The lock has been released.

from here https://en.wikipedia.org/wiki/Spinlock#Example_implementation

What I don't understand is why the unlock would need to be atomic. What's wrong with

spin_unlock:
     mov     [locked], 0

I agree that `mov` should work, especially given that only the least significant bit is used in the variable. — Jester, Apr 19 '16 at 23:30
I suppose using XCHG for the unlock gives `spin_unlock` a return value, 1 for success and 0 for an error because lock wasn't held. — Ross Ridge, Apr 19 '16 at 23:41
The problem is not actually atomicity - ordinary aligned 32-bit stores are always atomic on x86 - but *order*. `lock`ed atomics (including the implicitly `lock`ed `xchg`) have total order on x86, while ordinary stores only have release consistency. Of course, release semantics are enough for a spinlock, provided the acquire is done with a `lock`ed atomic. — EOF, Apr 20 '16 at 00:30
You want other threads that might be spinning on the same lock to see the update as soon as possible. MOV isn't good enough for that. See http://stackoverflow.com/questions/19652824/why-can-memorybarrier-be-implemented-as-a-call-to-xchg — Hans Passant, Apr 20 '16 at 13:38
@HansPassant I don't see anything in that question or the answers to it supporting this dubious claim. — EOF, Apr 20 '16 at 16:43
Is spinning on `xchg` ideal? With counting locks it's *much* better to spin on just a load, and only try taking the lock if you see it become unlocked. Spinning on `xchg` will potentially delay the unlocker's `xchg` from happening. If you don't write to the lock at all while it's locked, the core that owns the lock will still own the cache line when it tries to unlock, right? — Peter Cordes, Apr 21 '16 at 00:10
@PeterCordes will XCHGing 1 by 1 ( = no change at all in memory) change the cache line ownership ? — Tommylee2k, Apr 21 '16 at 13:19
@Tommylee2k: hmm, good question. I think so, [since the docs say a `lock`ed load is always followed by a locked store](http://stackoverflow.com/questions/36624881/why-is-integer-assignment-on-a-naturally-aligned-variable-atomic/36685056#36685056) as far as externally-visible behaviour. But I think that optimization would be possible, as long as the memory-barrier effect still happened. (e.g. `MFENCE` does it without a locked bus cycle). Might be worth testing with an experiment if you have the time. Can two threads can run at un-contended speed running `xchg [mem],eax` when `[mem]=eax`? — Peter Cordes, Apr 21 '16 at 13:27

score 4 · Answer 1 · edited May 23 '17 at 12:08

The unlock does need to have release semantics to properly protect the critical section. But it doesn't need sequential-consistency. Atomicity isn't really the issue (see below).

So yes, on x86 a simple store is safe, and glibc's pthread_spin_unlock does so::

    movl    $1, (%rdi)
    xorl    %eax, %eax
    retq

See also a simple but maybe usable x86 spinlock implementation I wrote in this answer, using a read-only spin loop with a pause instruction.

Possibly this code was adapted from a bit-field version.

Unlocking with btr to zero one flag in a bitfield isn't safe, because it's a non-atomic read-modify-write of the containing byte (or the containing naturally-aligned 4 byte dword or 2 byte word).

So maybe whoever wrote it didn't realize that simple stores to aligned addresses are atomic on x86, like on most ISAs. But what x86 has that weakly-ordered ISAs don't is that every store has release semantics. An xchg to release the lock makes every unlock a full memory barrier, which goes beyond normal locking semantics. (Although on x86, taking a lock will be a full barrirer, because there's no way to do an atomic RMW or atomic compare-and-swap without an xchg or other locked instruction, and those are full barriers like mfence.)

The unlocking store doesn't technically need to be atomic, since we only ever store zero or 1, so only the lower byte matters. e.g. I think it would still work if the lock was unaligned and split across a cache-line boundary. Tearing can happen but doesn't matter, and what's really happening is that the low byte of the lock is modified atomically, with operations that always put zeros into the upper 3 bytes.

If you wanted to return the old value to catch double-unlocking bugs, a better implementation would separately load and store:

spin_unlock:
     ;; pre-condition: [locked] is non-zero

     mov     eax,  [locked]        ; old value, for debugging
     mov     dword [locked], 0     ; On x86, this is an atomic store with "release" semantics.

     ;test    eax,eax
     ;jz    double_unlocking_detected    ; or leave this to the caller
     ret

score -1 · Answer 2 · answered Apr 20 '16 at 12:29

-1

the point of

spin_unlock:
     mov     eax, 0          ; Set the EAX register to 0.

     xchg    eax, [locked]   ; Atomically swap the EAX register with
                             ;  the lock variable.

     ret                     ; The lock has been released.

is not only to unlock, but also to fill eax with the correct return value ( 1 if unlocked, 0 otherwise)

if the lock was not obtained before calling spin_unlock ( [locked] holds the value 0 in this case), spin_unlock should return 0

answered Apr 20 '16 at 12:29

Tommylee2k

2,683
1
9
22

If the lock was not held by the thread/process doing the unlock, the behavior is undefined *anyway*. – EOF Apr 20 '16 at 12:59
in this case, unlock() returns 0, which isn't the worst of options – Tommylee2k Apr 21 '16 at 07:50
1

But if the process doing the unlocking didn't hold the lock, another may have. And now the lock is unlocked while the other process believes it held, and *another* process can acquire the lock. The only advantage is that you can now `panic()`, but you probably have data corruption already. – EOF Apr 21 '16 at 11:24
depends on if you define incorrect usage of locking to be business of the locking mechanism itself. I prefer getting a return value, and decide myself what to do with it :-) ... – Tommylee2k Apr 21 '16 at 13:14
1

If you want a return value to detect locking errors, then you should just `mov eax, [locked]` / `mov [locked], 0`. You're supposed to be holding the lock already when you call `unlock`; this will still catch double-unlock bugs. I think the point here is the `MFENCE` effect of using `xchg`, to give stronger-than-release semantics to the store. As people said in comments on the question, this shouldn't be needed since x86 is strongly ordered. An SFENCE could provide release semantics for NT stores, but usually that's done in the code that did NT stores, not in the locking. – Peter Cordes May 16 '16 at 02:49

Spinlock with XCHG unlocking

2 Answers2

Linked

Related