What you are seeing is a memory fence. What that instruction does is guaranteeing that all preceding load and store instructions become globally visible to any following load or store instructions.
A fence acts as a barrier, with the effect of flushing CPU buffers (note: buffers, not cache, that's a different thing) because data that was waiting to be written needs to be made globally available right away before continuing, in order to ensure that successive instructions will fetch the correct data.
This function was introduced to get around an hardware problem in an old family of Intel CPUs, namely the Pentium Pro (1995-98), which caused memory access operations under specific circumstances to be executed in the wrong order.
Nowdays the canonical way of applying a fence in x86 is through the use of the mfence
, lfence
or sfence
instructions (depending oh the type of fence needed), but those were only later added (with SSE and SSE2). On the Pentium Pro, no such instructions were available.
The lock
instruction is really just an instruction prefix, so this:
lock
addl $0,0(%esp)
Is actually a "locked add
".
The lock
prefix is used for opcodes that perform a read-modify-write operation to make them atomic. When applying lock add $0, 0(%esp)
, in order for the instruction to be atomic and therefore for the result to be immediately globally visible, a load+store fence is implicitly applied. The top of the stack is always readable and writable, and adding 0 is a no-op, so there's no need to pass a valid address to the function. This workaround therefore permits the correct serialization of memory access, and it's the fastest type of instruction to accomplish the goal on the Intel Pentium Pro.
See also these other posts: