x86 allows writing unaligned data that spans across two cache lines (i.e. across two 64 byte chunks), but the result is not guaranteed to be atomic. This means you may read 8 Byte from addr 0x1003c
for e.g., requiring the CPU to fetch 2 lines (0x10000
and 0x10040
), taking the relevant 4-byte chunks and stitching them together. However, these two lines may be stored in different locations - one could be cached, the other could be in the main memory. In extreme cases (page splits), one could in theory even be swapped out. As a result, you might get 2 data chunks from different times (a better term is observation points), where a store from some other process could have changed one in the middle.
On the other hand, once you add the lock prefix (or add an std::atomic definition, which should include that for you), x86 does guarantee that the result comes from a single observation point, and is consistent with observations from all other threads. To achieve this, it's quite possible that the CPU will enforce a complete block of all cores (for e.g. bus lock) until both lines are secured in the requesting core. If you don't you're risking a livelock where you constantly get one line, and lose it to another core by the time you got the second.
p.s. - user3286380 raised a good point, ++atomicInteger
is not atomic, even if you declared it as such. std::atomic guarantees an atomic read and an atomic write (each on its own observation point), but it doesn't guarantee atomic read-modify-write unless you explicitly state that.