I am running a few threads using pthreads on a real time linux (red hawk) in C++. All the threads run on a fixed frequency loop and one of the threads will poll the CPU clock and alert the other two threads that the next loop has started (by the end of the loop we can safely assume that the other loops have finished their task and are waiting for the next loop. My goal is to reduce latency where possible, and I have the ability to let threads take 100% of the CPU they are on (and guarantee they are the only thing running on that CPU due to the red hawk enhancements).
My idea to do this was to have the timing thread poll the cpu tick count until it reaches > X, then increment a 64 or 32 bit counter without asking for a mutex. The other two loops will poll this counter and wait for it to increase, also without asking for a mutex. How I see it no mutex is needed since the first thread can increment the counter atomically since it is the only thing writing to it. The other two threads can read from it without fear because a 32 or 64 bit number can be written to memory without it ever being a partial state (I think).
I realize that all my threads will be polling something and therefore running at 100% all the time, and I could reduce that by using the pthreads signaling, but I believe that the latency there is more than I want. I also know a mutex takes about a couple tens of nanoseconds, so I could probably use them without seeing the latency, but I don't see why it is needed when I have one thread increment a counter and the other two polling it.