1

I've been trying to implement a gcc inline function (AT&T assembly) that will perform an atomic CAS operation, but I can't get it to work - the return value is always getting messed up.

I've tried 2 different approaches, each seems to have its own misbehaviours:

1.

static inline int
cas(volatile void * addr, int expected, int newval) 
{
  int result = 1;
  asm volatile("lock; cmpxchgl %3, (%2)\n\t"
               "pushfl\n\t"
              "popl %%ebx\n\t"
              "andl $0x40, %%ebx\n\t"
              "cmpl $0x0, %%ebx\n\t"
              "jnz res%=\n\t"
              "movl $0, %0\n\t"
              "res%=:\n\t"
              : "=m"(result)
              : "a"(expected), "b"(addr), "r"(newval)
              : "memory");

  return result;
}

2.

static inline int cas(volatile void * addr, int expected, int newval) {
  int ret = 1;
  asm volatile("lock; cmpxchgl %3, (%2)\n\t"
          "jz cas_success%=\n\t"
          "movl $0, %0\n\t"
          "cas_success%=:\n\t"
  : "=m"(ret)
  : "a"(expected), "b"(addr), "r"(newval)
  : "memory");
  return ret;
}

But neither work, could anyone point me at the problem with one of the implementations?

Thanks

J. Doe
  • 35
  • 5
  • 4
    Why not use the C11 atomic types, or gcc intrinsics? No need to drop to assembly for an atomic CAS. – Shawn May 11 '20 at 13:40
  • Instead of loading the flags register into `ebx`, why don't you simple use a `jz` or `jnz` to jump directly on the value of the zero flag? – fuz May 11 '20 at 13:51
  • @Shawn I'm doing an assignment and that's a part of the requirements – J. Doe May 11 '20 at 13:53
  • 2
    So many problems and inefficiencies it's hard to know where to start. `pushf` is obviously insane; ZF is already in FLAGS ready for you to `jz` or `setz` with it instead of extracting it to an integer reg and getting it back into ZF with `and`/`cmp`. `"=m"(ret)` is a pure output, the `ret=1` initializer can thus be optimized away. Also, no reason to use a memory operand, `"+r"` would work. – Peter Cordes May 11 '20 at 13:54
  • @fuz see the second solution I tried (simply uses jz) – J. Doe May 11 '20 at 13:54
  • @J.Doe How do you know it doesn't work? How do you call this code and what output do you expect and get instead? – fuz May 11 '20 at 13:56
  • @PeterCordes If I don't initialize ret to 1 how can I be sure I get a non zero value on return? I can't say I understand your other comments. – J. Doe May 11 '20 at 13:58
  • 1
    @fuz It sometimes returns a big negative number on success, and sometimes even returns 0 – J. Doe May 11 '20 at 14:00
  • 2
    @J.Doe An `=` constraint says “the assembly will always overwrite whatever value that operand has.” Using an `=` constraint and then not assigning any value to that operand causes the operand to take an indeterminate value. Use a `+` constraint to tell the compiler that the operand's previous value is important. – fuz May 11 '20 at 14:00
  • 2
    @J.Doe: That's the opposite of what I meant. I said your constraint ignores the fact that you initialized `ret`, so you should fix your constraint to be `"+r"(ret)` (a read/write operand). Notice the difference between `+` and `=` in the GCC manual.: https://gcc.gnu.org/onlinedocs/gcc/Modifiers.html. Or you could just use an `"=r"` 8-bit `_Bool` output with `setz %0` to store ZF into a register as a 0 or 1 integer. See also https://stackoverflow.com/tags/inline-assembly/info for docs and tutorials. – Peter Cordes May 11 '20 at 14:04
  • @fuz This seems to work well now, but another problem emerged, I'll have to test some more and I'll return if the code still isn't working. – J. Doe May 11 '20 at 14:05
  • 3
    Or you could use GCC6 flag-output constraints to tell GCC directly that one of the outputs *is* ZF, avoiding the overhead of a `setz` instruction inside the asm. My answer on [c inline assembly getting "operand size mismatch" when using cmpxchg](https://stackoverflow.com/a/49839296) shows **a fully working C wrapper for `lock; cmpxchg` that has no other asm instructions inside the template**, just the necessary constraints. – Peter Cordes May 11 '20 at 14:08
  • @PeterCordes Sounds like a good dupe. – fuz May 11 '20 at 14:08
  • 1
    @fuz: not really; this question is asking *why* this attempt doesn't work. [Why does my "=r"(var) output not pick the same register as "a"(var) input?](https://stackoverflow.com/q/58238986) is closer to that, but the bug there is that they're expecting `"=r"` to pick the same register as an input operand. I assume there's a dup somewhere for using `"=r"` instead of `"+r"`, which might be all that's wrong with this for correctness. This is nasty for efficiency (branching instead of `setcc` and not letting the compiler pick an addressing mode) but that might be all for correctness. – Peter Cordes May 11 '20 at 14:14
  • do you have it working in real assembly and trying to get it working in inline or you dont have it working at all? – old_timer May 11 '20 at 15:32
  • @PeterCordes I know it's been a long time, just wondering if you wanted to summarize what you said into an answer, so that I could mark it as solved. The message before the one I said "this seems to work now" is what did the trick. – J. Doe Jun 02 '20 at 16:10
  • You could answer your own question if you now understand what was wrong with your original version. I've already written a couple answers about `lock cmpxchg` inline asm, including [c inline assembly getting "operand size mismatch" when using cmpxchg](https://stackoverflow.com/a/49839296), and don't really feel the need to write another one myself. – Peter Cordes Jun 02 '20 at 21:27

0 Answers0