There are questions on Stack Overflow with similar-sounding titles, but there is a critical difference between them and the question I'm asking (and not asking): see below in bold.
Say I have a time-consuming calculation. We'd like the fastest possible calculation and don't mind having several CPU cores spinning. I create a worker thread and hand half the calculation task to the worker.
The main thread sets an input variable for the worker to read, THEN sets the worker's output variable to a certain sentinel value.
The worker thread spins waiting for its output to be set to the sentinel, then reads the input, does the calculation, and overwrites the sentinel with the actual output.
Meanwhile, the main thread does its half of the calculation, then spins waiting for the output to change from the sentinel value it wrote, to any other value.
No race condition is possible, because only main sets the output variable to the sentinel, and only does so when it is non-sentinel. Only the worker sets the output variable to non-sentinel, and only does so when it is sentinel.
To end the child thread, the main thread sets a bool flag for the worker to exit, then sets the output variable to a sentinel. The worker sees the sentinel, checks the exit flag, and exits instead of doing the calculation.
I have made the input, output, and exit flag variables all atomic<>.
The question is: is this the fastest reliable way to communicate variables like double and bool between threads? Especially I in fact need to make all of input, output, and flag atomic? The software runs correctly over hundreds of millions of executions without atomic anything, but I suspect all three really should be atomic. Is there a faster way on a modern Intel CPU? I'm especially concerned about pernicious side effects on the cache lines: should I ensure the variables being spun on are on their own cache line?
(Also, the question is not: are spinning threads wasteful, is this a good general purpose solution for general purpose software, is this how a textbook would do it, are there more general-purpose ways to coordinate threads, etc.)