What's the correct way to release this mutex?

Question

I'm using the following x86-64 assembly for my functions to acquire and release a mutex:

.data

align   16
mtx:
    dd  0


.code

acquire_mutex PROC

lbl_retry:

    lock bts dword ptr [mtx], 1
    jnc     lbl_acquired

    pause
    jmp lbl_retry

lbl_acquired:
    ret
acquire_mutex ENDP


release_mutex PROC

    mov dword ptr [mtx], 0

    ret
release_mutex ENDP

My question is. Am I releasing the mutex correctly? Or do I need a lock on it as such?

release_mutex PROC

    lock and dword ptr [mtx], 0

    ret
release_mutex ENDP

Every ordinary (not non-temporal) x86 store to memory has release semantics. On the other hand, why are you using `lock bts` rather than `xchg` (which is `lock`ed implicitly)? If you intend to have multiple atomic bit flags in the same word, you will also need to *clear* the flags independently, which will take an atomic RMW operation. — EOF, Jul 11 '18 at 20:02
@EOF: Thanks. And , no, it's just one mutex. Idk, why I chose `BTS`. Is there any significant difference between `lock bts` and `xchg` instructions? — MikeF, Jul 11 '18 at 20:22
`mov dword ptr [mtx], 0` does not need a `lock` because it's already atomic and properly ordered. However, `and dword ptr [mtx], 0` involves a read and a write, and so it's not atomic and requires `lock` (unless `mtx` is used for a single mutex). Also, you need to make sure that no thread calls `release_mutex` without actually being the thread that holds the lock. — Hadi Brais, Jul 11 '18 at 20:23
@HadiBrais yes, my actual code uses processor index (returned from `KeGetCurrentProcessorNumberEx()` on Windows) to ensure that the same core doesn't try to acquire the mutex more than once. — MikeF, Jul 11 '18 at 20:26
@EOF: Guys, also one follow-up question. I keep seeing references to these `non-temporal` memory stores. Can you explain in plain English what that is? — MikeF, Jul 11 '18 at 20:30
That's *another* issue; you need to make sure you don't get a deadlock. But who owns the mutex, a core or a thread? If you use `KeGetCurrentProcessorNumberEx`, then it means that a core owns the mutex and any thread that runs on the core is basically in the critical section. — Hadi Brais, Jul 11 '18 at 20:35
@HadiBrais: Yes. It's my implementation specific quirk. This mutex is used from a high IRQL (on Windows) so task switching is disabled. i.e. there's no threads running on that core except mine. — MikeF, Jul 11 '18 at 20:38
Regarding NT stores, you can start with this [question](https://stackoverflow.com/questions/37070/what-is-the-meaning-of-non-temporal-memory-accesses-in-x86), and then follow the Linked questions that you see on the right side of the web page. — Hadi Brais, Jul 11 '18 at 20:38
@HadiBrais: Thanks. Good info. One last question concerning my releasing of mutex. (It's Microsoft and Visual Studio specific now.) If I were to do it in C or C++, do I need any special instructions for compiler, or will it always compile `mtx = 0;` into the `mov [mem], 0` instruction? — MikeF, Jul 11 '18 at 23:57
@MikeF: If you want to roll your own in C++, see my C11 stdatomic counting semaphore. You could easily simplify it to a mutex if you wanted, and/or port it to C++11 `std::atomic`. (But note that a counting semaphore *does* need `lock add` instead of a simple `lock.store(0, std::memory_order_release);`) — Peter Cordes, Jul 12 '18 at 03:12
If you use a processor index to select the mutex, then what's the point of having a mutex at all? You need all threads touching the same shared data structure to use the *same* mutex to lock each other out. Also, your thread could be rescheduled to another core between `KeGetCurrentProcessorNumberEx` returning and when you use the return value as an index. — Peter Cordes, Jul 12 '18 at 03:15
@PeterCordes: Sure, do you have a link? Although I'm not sure I can use much of C++ constructs at the low level I'm coding this thing in. (I can't even use constructors on structs.) As for your second question, I'm calling `KeGetCurrentProcessorNumberEx` after IRQL is raised above dispatch level. As for why I need a mutex -- threads are obviously not a concern then, but other processor cores are. — MikeF, Jul 12 '18 at 03:29
Oops, I forgot to paste the URL into my last comment. [C & low-level semaphore implementation](https://stackoverflow.com/a/36097001). C and inline or stand-alone asm are just different ways of getting the machine code you want into an object file. — Peter Cordes, Jul 12 '18 at 03:31
If you only need to synchronize between an interrupt handler and other code on the *same* core, you don't need a `lock` prefix at all. Regular `xadd [mem], 1` or `cmpxchg` is atomic with respect to interrupts. You're effectively in a uniprocessor situation. See a couple of the answers like this one on [Can num++ be atomic for 'int num'?](https://stackoverflow.com/a/39396781). You won't be able to get any of the mainstream x86 C++ compilers to emit non-`lock`ed `cmpxchg` from `std::atomic`, because none of them have a uniprocessor (or single-thread vs. signal handlers) code-gen mode. — Peter Cordes, Jul 12 '18 at 03:37
@PeterCordes: Sure, thanks. Although you had a more complex task with a semaphore. Mutex is much more simple. Again, I don't have any issues with the mutex itself. I just got this thought about releasing it with just an "unlocked" mov instruction. And that is why I posted this question. Now though after having read the comments above, I'm trying to find an intrinsic for a `mov` in case I need to implement it with C and intrinsics. — MikeF, Jul 12 '18 at 03:38
For `volatile int x` (if you want to try to trick the compiler into only doing within-one-thread sync to avoid lock prefixes), just `x=0;`. Or for `std::atomic x;` just `x.store(0, std::memory_order_release);`. (x=0 would compile to `xchg` or `mov + mfence`, which is slower and stronger (seq_cst) than you need.) — Peter Cordes, Jul 12 '18 at 03:40

What's the correct way to release this mutex?

0 Answers0