0

Recently I've been browsing through Go's source code and in particular runtime/internal/atomic/asm_386.s. Below is an implementation of an internal.atomic.Store64 function.

// void runtime∕internal∕atomic·Store64(uint64 volatile* addr, uint64 v);
TEXT runtime∕internal∕atomic·Store64(SB), NOSPLIT, $0-12
    MOVL    ptr+0(FP), AX
    TESTL   $7, AX
    JZ  2(PC)
    MOVL    0, AX // crash with nil ptr deref
    // MOVQ and EMMS were introduced on the Pentium MMX.
    MOVQ    val+4(FP), M0
    MOVQ    M0, (AX)
    EMMS
    // This is essentially a no-op, but it provides required memory fencing.
    // It can be replaced with MFENCE, but MFENCE was introduced only on the Pentium4 (SSE2).
    XORL    AX, AX
    LOCK
    XADDL   AX, (SP)
    RET

The question is related to the second comment in the function. Could someone explain how does XORL AX, AX work as a memory fence?

I guess it has something to do with the LOCK that goes right after, but how does it work?

Thank you.

tna0y
  • 1,842
  • 3
  • 16
  • 33
  • 3
    That's dumb, normal people use `lock addl $0, (SP)` as a (sometimes faster) equivalent for MFENCE. The other inefficiency is using `testl` instead of `testb` to check the low bits of EAX: there is no `test r32, imm8` encoding so it wastes several bytes of code size. – Peter Cordes Oct 07 '19 at 14:03
  • 1
    Also, I would have used `fild` / `fistp` instead of needing MMX EMMS, if I needed to avoid SSE2. Or SSE1 movlps load/store into xmm0, although that has a false dependency vs. an SSE2 `movq` or `movsd` load into xmm0. And I would have put the crash-on-misalignment load after the `ret` so the branch is not-taken in the normal case. – Peter Cordes Oct 07 '19 at 14:04

1 Answers1

2

It isn’t the XOR that acts as a memory fence. It is the locked XADD instruction that does that. The XOR instruction clears EAX so that the XADD instruction doesn’t actually change the contents of memory.

The memory-ordering behavior of locked instructions is described in the answer to this question: How many memory barriers instructions does an x86 CPU have?

user3666197
  • 1
  • 6
  • 50
  • 92
prl
  • 11,716
  • 2
  • 13
  • 31