How to make a lightweight load-store barrier

Question

For example if we have two std::atomics and want to read value from first and then flag second that we don't need value of first anymore. We don't want these operations to be reordered (otherwise first value can be rewritten before we read it), but there is no data dependency between operations, so we definetely need a barrier to prevent reordering (and memory_order_consume doesn't fit).

Full fence is certainly overkill here. Also we don't need neither release nor acquire semantic (even if they provide such barrier). All we need is just preserving order of read-and-then-write operations.

Is there some cheap fence that does what we need?

EDIT: examples of what I need.

std::atomic<X> atomicVal;
std::atomic<bool> atomicFlag = false;
...

auto value = atomicVal.load(std::memory_order_relaxed);
some_appropriative_fence();
atomicFlag.store(true, std::memory_order_relaxed);

And after atomicFlag is set atomicVal can be overwritten to some further value, so we need to read it before.

Of course we can make

auto value = atomicVal.load(std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
atomicFlag.store(true, std::memory_order_relaxed);

but it will be too expensive for operation we need.

I'm interesting what minimal fence is enough to guarantee order of operations.

To be sure you get the best answer, can you post a code snippet where the two variables are read/written, with expected output and example output that would be undesirable? — Jeffrey, Oct 17 '18 at 14:03
https://stackoverflow.com/questions/8819095/concurrency-atomic-and-volatile-in-c11-memory-model — Jeffrey, Oct 17 '18 at 14:38
If your code will execute on x86, this is guaranteed by the Intel architecture: _"Writes are not reordered with older reads."_ — Iwillnotexist Idonotexist, Oct 17 '18 at 15:36
Yes, but I target weak memory model where any operations can be reordered. — aaalex88, Oct 17 '18 at 16:16

Jeffrey · Answer 1 · 2018-10-18T15:05:58.073

Following your update: https://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering

You would want the atomic flags and variables to be written(stored) with:

ptr.store(p, std::memory_order_release);

and you would want the read of flags and values to be done with:

p2 = ptr.load(std::memory_order_acquire)

This seems to be the exact reason for their existence.

Edit 2: In fact, Release-Consume might be better. But I've never see it used. The link above also states:

 Note that currently (2/2015) no known production compilers track dependency chains: consume operations are lifted to acquire operations.

Edit 3: Sample code doing something similar to what I understand you want.

#include <thread>
#include <iostream>
#include <atomic>

std::atomic<int> x;
std::atomic<int> y;

auto write_op = std::memory_order_release;
auto read_op = std::memory_order_acquire;

// auto write_op = std::memory_order_seq_cst;
// auto read_op = std::memory_order_seq_cst;

void consumer()
{
    while(true)
    {
        int rx,ry;
        do
        {
            ry = y.load(read_op); // flag read first to guarantee x validity
            rx = x.load(read_op);
        }
        while(ry == 0); // wait for y. y acts as the flag, here

        if (ry == -1)
        {
            break;
        }

        if (rx != ry) // check consistency
        {
            std::cout << "Boo " << rx << " " << ry << std::endl;
        }

        x.store(0, write_op);
        y.store(0, write_op);
    }
}

void producer()
{
    int count = 0;
    int steps = 0;
    while(steps < 50)
    {
        while(y.load(read_op) != 0) {} // wait for y to have been consumed

        int value = std::rand() % 10 + 1;

        x.store(value, write_op); // stores values
        y.store(value, write_op); // indicates readiness to other thread

        count++;
        if (count == 1000000)
        {
            std::cout << '.' << std::endl;
            count = 0;
            steps++;
        }
    }
    y.store(-1);
}

int main()
{
    x = 0;
    y = 0;

    std::thread thread1(producer);
    std::thread thread2(consumer);

    thread1.join();
    thread2.join();
}

Yes, `seq_cst` surely suits, but I think it's overkill and is interested in cheapest possible fence. — aaalex88, Oct 17 '18 at 14:58
And when we do a read and then a write can we consider single release or acquire barrier is enough for these two operations not to be reordered with each other? — aaalex88, Oct 17 '18 at 16:24
If atomic_flag is stored order_release in thread A, when thread B will order_acquire load it, it will be guaranteed to see atomic_val writes (that were done before). For this to work, though, you must have a guarantee that no other atomic_val writes occur after you wrote atomic_flag. Typically by doing the opposite process of storing back the flag to false after reading. — Jeffrey, Oct 17 '18 at 19:01

How to make a lightweight load-store barrier

1 Answers1