1

On x86-64 I use a simple spinlock for critical sections:

mov al,1
LoopWait:
    xchg byte ptr[mlock], al
    test al,al
    jz Free
    pause
jmp LoopWait
Free:

And to exit the cs:

mov byte ptr[mlock], 0

Do I need a fence instruction before unlocking so that the store operations within the critical section are flushed?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Megatron
  • 123
  • 1
  • 11
  • 1
    x86 has total store order. All cpus see the writes of any particular cpu in the order that the cpu executed them. (caveat: some instructions violate TSO: non-temporal stores, fast `movs[b/w/l/q]`, WC-memory) – EOF Oct 15 '15 at 13:07
  • you still don't know the order in wich cpu executes them – CoffeDeveloper Oct 15 '15 at 13:45
  • @DarioOO: Every cpu by itself executes stores in program order. The general rule is that the semantics of a single-threaded program must not be impacted by memory reordering. – EOF Oct 15 '15 at 15:14
  • Yes infact. That's mean that If I write A before B (at least in theory) but A isn't needed in the next 50 cpu cycles it is very likely cpu will try to compute B first with pipelinening prefeteching etc. Execution order is store order, but execution order is not what you expect. A good compiler will already mix stuff up in "random" order to make it more pipe-lineable – CoffeDeveloper Oct 15 '15 at 15:25

1 Answers1

0

I'm not checking correctness of your code on purpose:

A spinlock is already a "memory fence" (it just does partial memory flush, so it is not really a memory fence), it synchronize already reads and writes (otherwise it could not work), so if the spinlock is correct and working you will never need an additional memory fence (wich would just be a useless penality).

That's a conceptual issue, you should know details about your architecture when implementing such stuff, especially the "memory contract" of single assembly instructions.

Memory fences have other purposes (like allowing objects in C++ to become fully initialized before starting using them in asynchronous code)

CoffeDeveloper
  • 7,961
  • 3
  • 35
  • 69
  • And thats exactly the problem here. If the critical sections writes data then releases the lock these store operations might be out of order and a second thread might get the lock although the data has not been written yet. the architecture im working is a x64 cpu like an amd k10 or intel sandy bridge. – Megatron Oct 15 '15 at 15:47
  • 1
    you may be interested in there: http://stackoverflow.com/questions/11959374/fastest-inline-assembly-spinlock the CPU instructions you should use for implementing a spinlock, already provide memory ordering. – CoffeDeveloper Oct 15 '15 at 15:58
  • 1
    @Megatron: Seriously? Does anyone have the capacity to read a couple of lines of documentation? Fine, I'll make it easy: Every ordinary store operation on x86 has release semantics. Does *that* answer your question? – EOF Oct 15 '15 at 16:13