3

Consider several threads executing concurrently the following code:

long gf = 0;// global variable or class member

//...

if (InterlockedCompareExchange(&gf, 1, 0)==0) // lock cmpxchg
{
    // some exclusive code - must not execute in concurrent
    gf = 0; // this is ok ? or need
    //InterlockedExchange(&gf, 0); // [lock] xchg 
}

Treat the code above as C-like pseudo-code, which will be translated more-or-less directly into assembly without the usual concessions to compiler optimizations such as re-odering and store elimination.

So after some thread exclusively acquires flag gf- to exit from the critical region is it enough to write a zero (as in gf = 0) or does this need to be interlocked - InterlockedExchange(&gf, 0)?

If both are OK, which is better from a performance view, assuming that with high probability that several cores concurrently call InterlockedCompareExchange(&gf, 1, 0)?

Several threads periodically execute this code (from several places, when some events fire) and it is important that the next thread again enters the critical region as soon as possible after it freed.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • I'd use a more detailed title and add a couple tags here, you'll get some more help. – Ryanman May 07 '18 at 19:47
  • @Ryanman - this question (how i view) related only to x86-x64 memory order. not related to compiler, language, os. so not view which more tag can be used here or more details – ross martin May 07 '18 at 19:58
  • 1
    It is definitely related to la language and compiler as Peter's answer shows. As pseodo-code without a re-oredring compiler it works, as actual C or C++ with a specific memory model it doesn't. – BeeOnRope May 09 '18 at 19:08

2 Answers2

2

Related: Spinlock with XCHG explains why you don't need xchg to release a lock in x86 asm, just a store instruction.

But in C++, you need something stronger than a plain gf = 0; on a plain long gf variable. The C / C++ memory model (for normal variables) is very weakly ordered, even when compiling for strongly-ordered x86, because that's essential for optimizations.

You need a release-store to correctly release a lock, without allowing operations in the critical section to leak out of the critical section by reordering at compile time or runtime with the gf=0 store. http://preshing.com/20120913/acquire-and-release-semantics/.

Since you're using long gf, not volatile long gf, and you aren't using a compiler memory barrier, nothing in your code would prevent compile-time reordering. (x86 asm stores have release semantics, so it's only compile-time reordering we need to worry about.) http://preshing.com/20120625/memory-ordering-at-compile-time/


We get everything we need as cheaply as possible using std::atomic<long> gf; and gf.store(0, std::memory_order_release); atomic<long> is lock-free on every platform that supports InterlockedExchange, AFAIK, so you should be ok to mix and match. (Or just use gf.exchange() to take the lock. If rolling your own locks, keep in mind that you should loop on a read-only operation + _mm_pause() while waiting for the lock, don't hammer away with xchg or lock cmpxchg and potentially delay the unlock. See Locks around memory manipulation via inline assembly.

This is one of the cases where the warning in Why is integer assignment on a naturally aligned variable atomic on x86? that you need atomic<> to make sure the compiler actually does the store where / when you need it applies.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
-1

gf = 0 is sufficient. There’s no need to use a locked operation since no other thread can be changing its value.

By the way, I’d use bts instead of cmpxchg to acquire the lock. I’m not sure if it makes any difference in performance, but it’s simpler.

prl
  • 11,716
  • 2
  • 13
  • 31
  • actually i use `InterlockedCompareExchangePointer` to save some pointer value in pointer (register) size variable. at once use this as critical region acquire and data store, which processed in region and release(set to 0) on the exit. not a single bit (0/1). but that does not change the essence – ross martin May 08 '18 at 19:07
  • `lock bts` might be slower, although maybe not with an immediate operand. (non-locked `bts` sucks with a memory operand, especial with `mem,reg` rather than `mem,imm`, because of the crazy-CISC bitstring indexing). `xchg mem,reg` is probably best, and it's cheap to test the reg result, like [in my example asm implementation](//stackoverflow.com/a/37246263) But either way you want to check that the lock is available read-only before using a locked op on that cache line, and then it very often succeed; don't spin on just `lock cmpxchg`, `lock bts`, or `xchg` – Peter Cordes May 09 '18 at 05:48
  • `mov [gf], 0` is sufficient in asm, but `gf = 0` in C / C++ isn't because of compile-time reordering. http://preshing.com/20120625/memory-ordering-at-compile-time/ – Peter Cordes May 09 '18 at 06:04
  • *`mov [gf], 0` is sufficient in asm* - this is main for me here. i know about possible C/C++ compiler reordering and this is not a problem for me here - before and after `gf = 0` exist external api calls - any compiler not reorder this operations, because this break program logic – ross martin May 09 '18 at 09:43