I am working in a bare-metal environment and thus evaluating performance at a low-level. How should I expect two threads on the same core to perform when writing to different sections of the same cache line?
I am somewhat new to multicore/multithread architectures. I understand that when different cores write to the same cache line locks or atomic operations are required to ensure race conditions are avoided. At the same time sharing a cache line between cores also sets one up for performance issues such as false sharing.
However, do I need to worry about similar things when the two threads are on the same core? I'm unsure seeing as they share the same cache and there are multiple load-store units. For example, say thread1 writes to section1 of the cache line at the same time that thread2 wants to write to section2 of the cache line. Does each thread just modify its own section of the cache line, or do they read the full line, modify their section, and write the full line back into the cache? If it's the latter do I need to worry about race conditions or performance delays?