Modern multicore CPUs synchronize cache between cores by snooping, i.e. each core broadcasts what it is doing in terms of memory access, and watches the broadcasts generated by other cores, to cooperate in making sure writes from core A are seen by core B.
This is good in that if you have data that really does need to be shared between threads, it minimizes the amount of code you have to write to make sure it does get shared.
It's bad in that if you have data that should be local to just one thread, the snooping still happens, constantly dissipating energy to no purpose.
Does the snooping still happens if you declare the relevant variables thread_local
? Unfortunately the answer is yes according to the accepted answer to Can other threads modify thread-local memory?
Does any currently extant platform (combination of CPU and operating system) provide any way to turn off snooping for thread-local data? Doesn't have to be a portable way; if it requires issuing OS-specific API calls, or even dropping into assembly, I'm still interested.