Do I need a memory barrier?

Question

In the below C99 example, is the buffer_full flag guaranteed to be set (even with -O2 optimizations enabled) after the buffer is read or written to? Or, do I need a memory barrier to ensure correct ordering?

I expect this to be run on a system where aligned 32-bit reads and writes are atomic.

Assume only one instance of each thread is being run and no other threads are accessing buffer or buffer_full.

char buffer[100];
int buffer_full;

// write interesting data to the buffer. does not read.
void fill_buffer(char* buffer, size_t buffsz);
// read the interesting data in the buffer. does not write.
void use_buffer(const char* buffer, size_t buffsz);

void writer_thread()
{
    if (!buffer_full) {
        fill_buffer(buffer, sizeof(buffer));
        // is a memory barrier needed here?
        buffer_full = 1;
    }
}

void reader_thread()
{
    if (buffer_full) {
        use_buffer(buffer, sizeof(buffer));
        // is a memory barrier needed here?
        buffer_full = 0;
    }
}

Your question is a little bit hard to understand but I suppose you mean, *is access to `buffer_full` atomic*? In principle, it is. — Iharob Al Asimi, Aug 29 '16 at 15:06
I think is also platform dependent. If `int` is native is atomic, and as far as variable has non-aligned address. — LPs, Aug 29 '16 at 15:09
Sort of. I want to ensure an empty buffer is never read and a full buffer is never written. I'm concerned mostly with ordering. Will setting buffer_full ever be reordered before it is shown in the code either by the compiler or by the hardware. — PaulH, Aug 29 '16 at 15:49

score 4 · Accepted Answer · edited May 23 '17 at 11:51

4

I interpret you to be asking whether a compiler can reorder the assignments to buffer_full with the calls to fill_buffer() and read_buffer(). Such an optimization (and any optimization) is permitted only if it does not alter the externally-observable behavior of the program.

In this case, because buffer_full has external linkage, it is unlikely that the compiler can be confident about whether the optimization is permitted. It might be able to do so if the definitions of the fill_buffer() and use_buffer() functions, and of every function they themselves call, etc. are in the same translation unit with the writer_thread() and reader_thread() functions, but that depends somewhat on their implementations. If a conforming compiler is not confident that the optimization is allowed, then it must not perform it.

Inasmuch as your naming implies that the two functions will run in different threads, however, then without synchronization actions such as a memory barrier, you cannot be confident about the relative order in which one thread will perceive modifications to shared, non-_Atomic, non-volatile data performed by a different thread.

Moreover, if one thread writes a non-atomic variable and another thread accesses that same variable (read or write), then there is a data race unless a synchronization action or an atomic operation intervenes between the two in every possible overall order of operations. volatile variables don't really help here (see Why is volatile not considered useful in multithreaded C or C++ programming?). If you make buffer_full atomic, however, or if you implement your functions using atomic read and write operations on it, then that will serve to avoid data races involving not only that variable, but buffer as well (for your present code structure).

edited May 23 '17 at 11:51

Community

1
1

answered Aug 29 '16 at 15:59

John Bollinger

160,171
8
81
157

Thanks! Would it make a difference if either the buffer or the flag were atomic? – PaulH Aug 29 '16 at 16:36
1

edit: realized I said "atomic" when I meant to say "volatile" in the last comment. – PaulH Aug 29 '16 at 17:34
I've added an assumption to the question that these are the only 2 threads accessing `buffer` and `buffer_full`. Given that, is the mutex still required? If so, can you help me understand why? – PaulH Aug 29 '16 at 17:49
@PaulH, @PaulH, making either or both `volatile` would ensure that writes of that variable by one thread are seen via reads by a different one, and would ensure no reordering of operations on the so-qualified variable. It seems unlikely that that would yield the semantics you actually want, however: is it really ok for the reader thread to fail to actually read, or for the writer thread to fail to actually write? Usually in such constructs you want the reader to *wait* until it can read, or the writer to *wait* until it can write. That's what you would get from a mutex. – John Bollinger Aug 29 '16 at 17:51
For this over-simplified example, yes. [That's fine](https://www.google.com/search?q=this+is+fine). I want to ensure I do not have a race condition here. – PaulH Aug 29 '16 at 17:55
Also, does this cover hardware out-of-order execution? – PaulH Aug 29 '16 at 18:00
1

@PaulH, as far as the standard is concerned there *absolutely* is a data race, even with volatile objects. Read and modification of `buffer_full` are "conflicting" operations. If those are performed by different threads, at least one non-atomically, then that constitutes a data race unless there is a synchronization action between the read and the write in every possible overall order of execution. This is a bit more specific than I wrote before -- I will update my answer. – John Bollinger Aug 29 '16 at 18:16
1

@PaulH, I have updated my answer. As for hardware out-of-order execution, if your program exhibits no undefined behavior (notably including any arising from having a data race) then a conforming compiler will produce code that exhibits conforming behavior for the target environment. CPU-level behaviors are *its* concern, not yours -- that's one reason to prefer C to assembly. – John Bollinger Aug 29 '16 at 19:03

Do I need a memory barrier?

1 Answers1