I have kernel code and userspace code that synchronize on atomic variables. The kernel and userspace code may be running on the same logical core or different logical cores. Let's assume the architecture is x86_64.
Here is an initial implementation to get our feet wet:
Kernel (C) Userspace (C++)
--------------------------- -----------------------------------
Store A (smp_store_release) Store B (std::memory_order_release)
Load B (smp_load_acquire) Load A (std::memory_order_acquire)
I require that from the perspective of each thread, its own load happens after its own store. So for example, from userspace's perspective, the load to A must happen after the store to B.
Furthermore, I similarly require that for a given thread, it observes the other thread do the load after the store. So for example, from the kernel's perspective, it must observe that userspace stores to B before loading A.
Clearly, the code above is insufficient to meet these two requirements, so for the sake of this question, I rewrite it as so:
Kernel (C) Userspace (C++)
--------------------------- -----------------------------------
Store A (smp_store_release) Store B (std::memory_order_release)
cpuid std::atomic_thread_fence(std::memory_order_seq_cst)
Load B (smp_load_acquire) Load A (std::memory_order_acquire)
According to the Intel manual, cpuid
is a serializing operation.
Here are my questions:
- If I issue
cpuid
with the asm compiler-levelmemory
barrier, does this have the same behavior as a sequentially consistent fence? - Now let's say I issue
cpuid
without the asm compiler-levelmemory
barrier. Furthermore, let's say that the store to A is standard kernel code while the load to B is done by a BPF program. Doescpuid
have the same behavior as a sequentially consistent fence in this case? My impression is that it does, because (1)cpuid
provides hardware serialization and (2) compiler reordering is impossible since the kernel is compiled separately from the BPF program. - The C++ standard requires that synchronization occur between threads on the same address. It seems that issuing an
mfence
(or another type of fence) is sufficient to achieve hardware serialization, andmfence
does not have a memory address as an argument. Thus, does the standard impose this requirement solely to prevent compiler reordering?