My target platforms are windows and linux with x86-64 (coffe lake or higher, zen 2 or higher) and mac m2. I'm wondering is there a penalty for multiple threads accessing the same data at the same time? and how much of a penalty there is if one thread changes one variable once. Do other cores stall immediately if that cache line is loaded? How many cycles does it take to update? From my understanding false sharing happens when you change a byte on a line, is this strictly 128 bytes and less? I don't have to worry about TLB? here's my situation
I have a few objects which is a source of truth for some data. I can't remember if they're 64 bytes or over. I have a 32bit status flag in the first 64 bytes. Many threads may access this and bytes next to it sometimes 100 times, sometimes one other thread once. I'm not sure how many nanoseconds between a write and read will be but only one write will happen
C++ thread sanitizer complained that I'm changing the flag in one thread and reading in another, neither using atomic operations. The other threads don't need to see the update since I simply set a bit they don't care about. I was thinking I can use atomic load/store with atomic_relaxed.
Another option is having a pointer and going through that to update the data. I was thinking if 1K objects are written to once and every other thread happen to read it within 10ns, would it be a problem? How many cycles would that stall? This is assuming there's no penalties when many cores are reading the same data. I have a bit of a memory bandwidth problems (I'm writing a lot of data) so I'm concerned about using more data when I don't need to