Do I need the volatile qualifier for variables only accessed while a lock is held? In this code, could removing the volatile qualifier from n
possibly change the behavior when concurrent_foo
is executed concurrently.
#ifndef __GNUC__
#error __sync_lock builtins are only available with GCC
#endif
volatile int n = 0;
static volatile int lock = 0;
void concurrent_foo () {
while (__sync_lock_test_and_set (&lock, 1));
// Non-atomic operation, protected by spinlock above.
int x = n % 2 + 1;
n = n + x;
__sync_lock_release (&lock);
}
I understand that the volatile qualifier instructs the compiler not to optimize memory accesses to a variable. I also understand that the __sync_lock builtins issue a (full?) memory barrier, which memory accesses should not cross. However, it would be safe in this example code to fetch n
, cache it in a register, compute the new value, and then write it back to n
.
Compiling with GCC to i686 source using -O3
reveals that two memory fetches are made, unessesarily:
concurrent_foo:
movl $1, %edx
.L2:
movl %edx, %eax
xchgl lock, %eax
testl %eax, %eax
jne .L2
movl n, %eax
movl n, %edx
movl %eax, %ecx
shrl $31, %ecx
addl %ecx, %eax
andl $1, %eax
subl %ecx, %eax
leal 1(%edx,%eax), %eax
movl %eax, n
movl $0, lock
ret
Without the volatile qualifier I get subtly different code, where n
is fetched just once:
concurrent_foo:
movl $1, %edx
.L2:
movl %edx, %eax
xchgl lock, %eax
testl %eax, %eax
jne .L2
movl n, %edx
movl %edx, %ecx
shrl $31, %ecx
leal (%edx,%ecx), %eax
andl $1, %eax
subl %ecx, %eax
leal 1(%edx,%eax), %eax
movl %eax, n
movl $0, lock
ret
In both circumstances, memory accesses to n
occur while the lock is held, and thus should be "correct". However, I am unsure if I am really guaranteed that. The volatile qualifier is preventing a performance optimization that I would like and would not affect the outcome of the operation (at no point would n
be even).