I have a problem that I need to understand if there is a better solution. I have written the following code to pass a few variables from a writer thread to a reader thread. These threads pinned to different CPUs sharing the same L2 cache (disabled hyperthreading).
writer_thread.h
struct a_few_vars {
uint32_t x1;
uint32_t x2;
uint64_t x3;
uint64_t x4;
} __attribute__((aligned(64)));
volatile uint32_t head;
struct a_few_vars xxx[UINT16_MAX] __attribute__((aligned(64)));
reader_thread.h
uint32_t tail;
struct a_few_vars *p_xxx;
The writer thread increases the head variable and the reader thread checks whether the head variable and the tail is equal. If they are not equal then it reads the new data as follows
while (true) {
if (tail != head) {
.. process xxx[head] ..
.. update tail ..
}
}
Performance is by far the most important issue. I'm using Intel Xeon processors and the reader thread fetches the head value and the xxx[head] data from memory each time. I used the aligned array to be lock free
In my case, is there any method to flush the variables to the reader CPU cache as soon as possible. Can I trigger a prefetch for the reader CPU from writer CPU. I can use special Intel instructions using __asm__ if exist. In conclusion, what is the fastest way to pass the variables in the struct between threads pinning to different CPUs?
Thanks in advance