a frame
is shared with a kernel.
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
Kernel code:
poll for frame.status // reads a frame's status
_mm_lfence
Kernel can poll it asynchronically, in another thread. So, there is no syscall
between userspace code and kernelspace.
Is it correctly synchronized?
I doubt because of the following situation:
A compiler has a weak memory model and we have to assume that it can do wild changes as you can imagine if optimizied/changed program is consistent within one-thread.
So, on my eye we need a second barrier because it is possible that a compiler optimize out store frame.status, 0
.
Yes, it will be a very wild optimization but if a compiler would be able to prove that noone in the context (within thread) reads that field it can optimize out it.
I believe that it is theoretically possibe, isn't it?
So, to prevent that we can put the second barrier:
User-space code:
read frame // read frame content
_mm_mfence // prevent before "releasing" a frame before we read everything.
frame.status = 0 // "release" a frame
_mm_fence
Ok, now compiler restrain itself before optimization.
What do you think?
EDIT
[The question is raised by the issue that __mm_fence
does not prevent before optimizations-out
.
@PeterCordes, to make sure myself: __mm_fence
does not prevent before optimizations out (it is just x86 memory barrier, not compiler). However, atomic_thread_fence(any_order)
prevents before reorderings (it depends on any_order
, obviously) but it also prevents before optimizations out?
For example:
// x is an int pointer
*x = 5
*(x+4) = 6
std::atomic_thread_barrier(memory_order_release)
prevents before optimizations out of stores to x
? It seems that it must- otherwise every store to x
should be volatile
.
However, I saw a lot of lock-free code and there is no making fields
as volatile
.