I am playing with stdatomic.h
, the C11 memory model(s) & the latest stable gcc
that compiles for a modern x86-64 Intel CPU.
The following C11 code:
void lock(struct spinlock_t *sl) {
while (atomic_exchange_explicit(&__sl.taken, 1, memory_order_acquire))
while (atomic_load_explicit(&__sl.taken, memory_order_relaxed))
__asm volatile("pause" ::: "memory");
}
with the -O3
flag emits the following assembler:
0000000000001370 <lock>:
1370: f3 0f 1e fa endbr64
1374: ba 01 00 00 00 mov $0x1,%edx
1379: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
1380: 48 89 d0 mov %rdx,%rax
1383: 48 87 05 8e 2c 00 00 xchg %rax,0x2c8e(%rip) # 4018 <__sl>
138a: 48 85 c0 test %rax,%rax
138d: 74 11 je 13a0 <lock+0x30>
138f: 90 nop
1390: 48 8b 05 81 2c 00 00 mov 0x2c81(%rip),%rax # 4018 <__sl>
1397: 48 85 c0 test %rax,%rax
139a: 74 e4 je 1380 <lock+0x10>
139c: f3 90 pause
139e: eb f0 jmp 1390 <lock+0x20>
13a0: c3 ret
It is perfectly expected to see the pause
there. However, I am having hard times explaining to myself why no fence
instructions were emitted? Where are the memory barriers? How does this assembly guarantee sequential consistency after all?