3

This is related to this question.

rmmh's claim on that question was that on certain architectures, no special magic is needed to implement atomic get and set. Specifically, in this case that g_atomic_int_get(&foo) from glib gets expanded simply to (*(&foo)). I understand that this means that foo will not be in an internally consistent state. However am I also guaranteed that foo won't be cached by a given CPU or core?

Specifically, if one thread is setting foo, and another reading it (using the glib g_atomic_* functions), can I assume that the reader will see the updates to the variable made by the writer. Or is it possible for the writer to simply update the value in a register? For reference my target platform is gcc (4.1.2) on a multi-core multi-CPU X86_64 machine.

Community
  • 1
  • 1
laslowh
  • 8,482
  • 5
  • 34
  • 45

2 Answers2

3

What most architecture ensures (included) is atomicity and coherence of reads and writes of suitably sized and aligned read/write (so every processors see a subsequence of the same master sequence of values for a given memory adress (*)), and int is most probably suitably size and compilers generally ensure that they are also correctly aligned.

But compilers rarely ensures that they aren't optimizing out some reads or writes if they aren't marked in a way or another. I've tried to compile:

int f(int* ptr)
{
    int i, r=0;
    *ptr = 5;
    for (i = 0; i < 100; ++i) {
        r += i*i;
    }
    *ptr = r;
    return *ptr;
}

with gcc 4.1.2 gcc optimized out without problem the first write to *ptr, something you probably don't want for an atomic write.

(*) Coherence is not to be confused with consistency: the relationship between reads and writes at different address is often relaxed with respect to the intuitive, but costly to implement, sequential consistency. That's why memory barriers are needed.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • How would you use a memory barrier to ensure that the first write to *ptr is not optimized out? – laslowh Apr 20 '11 at 16:05
  • 1
    If you put a barrier before and after the write, that should prevent optimization. But formally a barrier isn't needed. Just writing `*(volatile int*)ptr = 5` will probably (if not formally) work. (BTW, the stuff you gave perhaps have as precondition that the variable is declared volatile). But I note that g++ 4.5.1 apparently uses intrinsics to provide the C++0X atomic header, it is the kind of thing that it is better to let the compiler handle. – AProgrammer Apr 20 '11 at 16:25
  • Volatile will only ensure that the compiler doesn't use a register to hold the variable. It will not act as a compiler barrier. – johnnycrash Apr 25 '11 at 23:53
  • @johnnycrash, volatile does not implies memory barriers, but it does implies that other volatile access to the same object are not reordered and will also ensures that the compiler doesn't do stupid things like accessing the object with several instructions when one is enough. Just what is needed here for atomicity. – AProgrammer Apr 26 '11 at 05:56
1

Volatile will only ensure that the compiler doesn't use a register to hold the variable. Volatile will not prevent the compiler from re-ordering code; although, it might act as a hint to not reorder.

Depending on the architecture, certain instructions are atomic. writing to an integer and reading from an integer are often atomic. If gcc uses atomic instructions for reading/writing to/from an integer memory location, there will be no "intermediate garbage" read by one thread if another thread is in the middle of a write.

But, you might run into problems because of compiler reordering and instruction reordering.

With optimizations enabled, gcc reorders code. Gcc usually doesn't reorder code when global variables or function calls are involved since gcc can't guarantee the same outcome. Volatile might act as a hint to gcc wrt reordering, but I don't know. If you do run into reordering problems, this will act as a general purpose compiler barrier for gcc:

__asm__ __volatile__ ("" ::: "memory");

Even if the compiler doesn't reorder code, the CPU constantly reorders instructions during execution. Here is a very good article on the subject. A "memory barrier" is used to prevent the cpu from reordering instructions over a barrier. Here is one possible way to make a memory barrier using gcc:

__sync_synchronize();

You can also execute asm instructions to do different kinds of barriers.

That said, we read and write global integers without using atomic operations or mutexes from multiple threads and have no problems. This is most likely because A) we run on Intel and Intel does not reorder instructions aggressively and B) there is enough code executing before we do something "bad" with an early read of a flag. Also in our favor is the fact that a lot of system calls have barriers and the gcc atomic operations are barriers. We use a lot of atomic operations.

Here is a good discussion in stack overflow of a similar question.

Community
  • 1
  • 1
johnnycrash
  • 5,184
  • 5
  • 34
  • 58