inline assembly, spinlock using cmpxchg x86 (amd 64)

Question

This is the first question for me here. I'm implementing spinlock using cmpxchg instruction. There is no lock change after cmpxchg instruction. In that, there is something wrong with my assembly code.

This is my thread function code.

void *t_function(void *data){
 volatile int* t_cnt;

 t_cnt = (int *)data;

 /* Lock */
 while(CompareAndSet(&lock, 0, 1));

 /* Critical Section */
 for(; i<TEST_NUM; i++){
  (*t_cnt)++;
 }

 /* Unlock */
 lock = 0;
}

This is my CompareAndSet function with inline assembly code.

unsigned int CompareAndSet(unsigned in *target, unsigned int expected, unsigned int update){
 unsigned int rv = *target;

 printf("lock : %d, expected : %d, update : %d\n", *target, expected, update);

 __asm__ __volatile__(
 "cmpxchg %0, %2\n\t"
 :"+r" (*target), "+a" (expected)
 :"r" (update)
 );

 printf("lock_after : %d\n", *target);
 return rv;
}

When I compile and run this, I got this. It seems that lock doesn't change.

... lock : 0, expected : 0, update : 1 lock_after : 0 Thread [19] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [20] created. lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 Thread [21] created. Thread [22] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [23] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [24] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [25] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [26] created. Thread [27] created. Thread [28] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [29] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [30] created. lock : 0, expected : 0, update : 1 lock_after : 0 Thread [31] created. lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 lock : 0, expected : 0, update : 1 lock_after : 0 cnt : 9478079

#define THREAD_NUM 32 #define TEST_NUM 10000000

Does anyone have idea with my assembly code?

From 'AMD64 Architecture Programmer’s Manual Volume 3: General Purpose and System Instructions PDF', I got how to use cmpxchg instruction. It says,

Mnemonic :
CMPXCHG reg/mem32, reg32

Opcode : OF B1 /r

Description : Compare EAX register with a 32-bit register or memory location. If equal, copy the second operand to the first operand. Otherwise, copy the first operand to EAX.

Shouldn't that be `lock cmpxchg`? You have a non-atomic Compare and Set now, that's odd. By the way you can make Compare and Set simpler. — harold, Apr 05 '15 at 12:35
You just do that for an educational purpose, I suppose? Otherwise you should use C11's `atomic_flag_test_and_set` or, if that is not available, compiler builtins such as gcc's `__sync_lock_test_and_set`. — Jens Gustedt, Apr 05 '15 at 12:43
I was noticing: `printf("lock_afetr : %d\n", *target);` does not output `lock_after : 0 ` (spelling and leading or trailing space padding). — Weather Vane, Apr 05 '15 at 12:53
@harold Yes. I thought lock prefix should be there and I tried it. It got 'Error: expecting lockable instruction after 'lock'' error. How can I use lock prefix..? Do you have any Idea? — user4751448, Apr 05 '15 at 13:09
@JensGustedt Comparing the performance of various ways for mutual exclusion is my assignment for my Distributed System class. Can I use it to measure the time of Compare and Exchange? — user4751448, Apr 05 '15 at 13:14
You should use a memory constraint for the output argument. It doesn't make much sense otherwise, and `lock` is also only allowed in that case obviously. — Jester, Apr 05 '15 at 13:19
@Jester I'm sorry there are so many things that don't make sense cause this is the first time I use inline assembly. Can explain in detail? — user4751448, Apr 05 '15 at 13:25
Implementing a spinlock only makes sense if it is in memory. `lock` prefix also only works with memory, there is no point in trying to lock operations on registers because registers are private to your thread. What you have done was load the spinlock into a register, do the CAS on the register then write it back into memory. This is not thread-safe. — Jester, Apr 05 '15 at 13:28
@Jester oh..I got it. Then how can I measure the performance of embedded CompareandExchange that amd 64 provide? Comparing the performance of various ways of synchronization is my assignment. — user4751448, Apr 05 '15 at 13:33
As I said, use a memory constraint `"+m" (*target)` should do the trick. — Jester, Apr 05 '15 at 14:08
@Jester I already tried it. '__asm__ __volatile__( "lock cmpxchg %0, %2\n\t" :"+m" (*target), "+a" (expected) :"r" (update) );' but I got Error: operand size mismatch for `cmpxchg' — user4751448, Apr 05 '15 at 14:11
Are you sure you are not using at&t syntax? That error happens if you are... — Jester, Apr 05 '15 at 17:57
`lock cmpxchgl` for 32-bit operands. Note the `l` at the end. — EOF, Apr 06 '15 at 01:15
@EOF that would be at&t syntax and he says he is using intel. Also, you don't need the `l` suffix since the operand size can be deduced from the register arguments. — Jester, Apr 06 '15 at 11:28

score 1 · Answer 1 · edited Mar 15 '23 at 08:13

1

Your inline asm is supposed to be at&t syntax by default. Keep in mind, it has reversed argument order(src, dst). We'll move on with at&t syntax.
As @Jester mentioned, there is no meaning in using spinlock if it's not in the memory. So you should change constraint to "+m"(*target).
Now, cmpxchg instruction does not atomically swap values. To do the swap atomically, you should add lock prefix, like that:
```
 lock cmpxchg %2, %0;
```
(See arguments order changed because of at&t syntax).
You were missing a "memory" clobber to stop compile-time reordering with other statements.
Your return value was an earlier non-atomic load, so it could match expected in cases where the CAS failed, or vice versa. You need to return the updated value of expected, and/or a bool from old_expected == new_expected or from FLAGS.

Summing up

int cmpxchg(int* val, int expected, int newval) {
    __asm__ volatile(
        "lock cmpxchg %2, %0\n\t"
        : "+m" (*val), "+a" (expected)
        : "r" (newval)
        : "rax", "memory"
    );
    return expected;
}

Big thanks to @PeterCordes for essential corrections. See also his answer on c inline assembly getting "operand size mismatch" when using cmpxchg for another version of this wrapper that compiles with or without -masm=intel, using dialect-alternatives. And a version that uses the ZF output of lock cmpxchg.

But generally you shouldn't use inline assembly for this at all (https://gcc.gnu.org/wiki/DontUseInlineAsm), except as a learning exercise in inline asm, unless you're writing a freestanding program / kernel and want to follow the Linux kernel's style. Use the GNU C __atomic builtins instead:

  __atomic_compare_exchange_n(val, &expected, desired,
      false /* not weak */, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)

That has similar semantics to x86 locked instructions, except on it's not guaranteed to be a full barrier for other operation (like a fence), only to have seq_cst semantics itself. There can be a real difference on AArch64 for example. But it's still more than fine for a spinlock; that only needs an __ATOMIC_ACQUIRE CAS for taking the lock, and a RELEASE store. Also, since you already call it in a retry loop, you'd use weak = true to get behaviour like atomic_compare_exchange_weak.

edited Mar 15 '23 at 08:13

Peter Cordes

328,167
45
605
847

answered Mar 15 '23 at 05:34

Levon Minasian

531
1
5
17

Don't use `mov` in the template, just use `"+a"(exp)` to force `%1` to pick EAX. (https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html) That would also make your `return exp` useful, since you'd be capturing the different value seen in memory in case of a failed comparison. As is, your code always leaves `%1` unmodified, so always returns the incoming `exp` unchanged, not letting the caller tell if CAS succeeded or failed. Oh, the code in the question already did that, and you broke that part of it. – Peter Cordes Mar 15 '23 at 06:36
You are correct that `%0` needs to be the destination (2nd operand for AT&T syntax), and the constraint needs to be `"+m"` not `"+r"` to do a CAS on memory, not just a register operation, though. Also, `new` doesn't change; that operand can be a pure input (`"r"(new)`). Also, `new` is a C++ keyword; probably a good idea to avoid it for something potentially reusable. – Peter Cordes Mar 15 '23 at 06:39
Of course it would be easier and almost always better to use `__atomic_compare_exchange_n(val, &exp, desired, false /* not weak */, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST)` instead of using inline asm at all. https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html . That will compile to a `lock cmpxchg`, and will block compile-time reordering wrt. surrounding operations, unlike this where the OP forgot a `"memory"` clobber. – Peter Cordes Mar 15 '23 at 06:45
@PeterCordes Corrected, thanks. I didn't used "+a" constraint, because it follows from provided link that the effect is machine-dependent. Also OP's code save the old value *before* `cmpxchg`, so the "compare-and-exchange" is not done atomically, but it should, and in general it will lead to wrong return value. – Levon Minasian Mar 15 '23 at 07:50
`lock cmpxchg` is an x86 instruction, so this whole function needs to be inside a `#if defined(__i386__) || defined(__amd64__)`; the effect of an `"a"` constraint *is* well-defined on x86 / x86-64. You're making it less efficient to fix a problem that you introduced by changing `"+a"` to `"+r"`. – Peter Cordes Mar 15 '23 at 07:52
added memory clobber – Levon Minasian Mar 15 '23 at 07:53
@PeterCordes you're right, `cmpxchg` is an x86 instruction, changed to "+a" constraint. – Levon Minasian Mar 15 '23 at 07:56
I meant that part of the OP's `asm` statement was correct. The fact that they return `rv` instead of `expected` is really bad, since it came from a separate (non-atomic) load. In some cases it might mis-match the incoming `expected` but the CAS would still succeed, if another thread could change it the right way. (e.g. if another thread unlocked the spinlock) So yes that needed fixing, too. – Peter Cordes Mar 15 '23 at 07:56
But again, inline assembly is the wrong way to do this, except for learning about how to use inline asm. `__atomic_compare_exchange_n` exists as a GNU C builtin function usable on plain `int*`. That gives you the updated `expected` *and* a `bool` return value. – Peter Cordes Mar 15 '23 at 07:59
Depends on what OP wanted to accomplish. I believe it was part of learning inline asm. It's much fun anyways :) – Levon Minasian Mar 15 '23 at 08:02
I think I'm done editing now, if you want to have a look at the final state of what I said in your answer and maybe edit anything you want to say differently. – Peter Cordes Mar 15 '23 at 08:17

inline assembly, spinlock using cmpxchg x86 (amd 64)

1 Answers1