What are some use cases for memory_order_relaxed

Question

The C++ memory model has relaxed atomics, which do not put any ordering guarantees on memory operations. Other than the mailbox example in C which I have found here:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1525.htm

Based on the motivating example in this paper:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2153.pdf

I was curious about other use cases for this type of synchronization mechanism.

score 18 · Accepted Answer · answered Jun 12 '14 at 12:07

18

A simple example that I see in my work frequently is a stats counter. If you want to count the number of times an event happens but don't need any sort of synchronization across threads aside from making the increment safe, using memory_order_relaxed makes sense.

static std::atomic<size_t> g_event_count_;

void HandleEvent() {
  // Increment the global count. This operation is safe and correct even
  // if there are other threads concurrently running HandleEvent or
  // PrintStats.
  g_event_count_.fetch_add(1, std::memory_order_relaxed);

  [...]
}

void PrintStats() {
  // Snapshot the "current" value of the counter. "Current" is in scare
  // quotes because the value may change while this function is running.
  // But unlike a plain old size_t, reading from std::atomic<size_t> is
  // safe.
  const size_t event_count =
      g_event_count_.load(std::memory_order_relaxed);

  // Use event_count in a report.
  [...]
}

In both cases, there is no need to use a stronger memory order. On some platforms, doing so could have negative performance impact.

answered Jun 12 '14 at 12:07

jacobsa

5,719
1
28
60

Would it also be appropriate to use relaxed memory order in cases where something is lazily computed, and computing it more than once would be slightly inefficient but otherwise harmless? If a value will be read millions of times, even a tiny reduction in the cost of each read could more than make up for the cost of a few redundant compuations. – supercat Nov 01 '15 at 17:41
4

That seems fine to me, but you have to be very careful that you're not trying to synchronize using the value. For example, if you compute a struct and then try to use `std::atomic` with `std::memory_order_relaxed`, you're going to have a bad time, because you haven't ensured that other threads see the the writes initializing the struct before the write setting the pointer. – jacobsa Nov 01 '15 at 23:25
OK, so you have writers which atomically increment a counter. But eventually you will want to read the counter somewhere. Like the PrintStats() in your example. So is this only applicable when you have count increments that don't necessarily have to propagate immediately? When you read the counter with std::memory_order_relaxed could it be possible that you read an outdated g_event_count, or not? – user643011 Oct 10 '17 at 18:49
I found an answer to my own question: "The only way to guarantee you have the "latest" value is to use a read-modify-write operation such as exchange(), compare_exchange_strong() or fetch_add()" https://stackoverflow.com/a/8833218/643011 – user643011 Oct 10 '17 at 20:05
If you read the value, you're guaranteed to see any updates before the most recent synchronizing operation. For example, if thread A updates the counter, then unlocks a mutex that thread B takes and then reads the counter, thread B will see thread A's write. (This is compatible with your link because the mutex access is like a read-modify-write op.) In the absence of such a synchronizing event there is no such thing as the "latest" value, because without such an event there's no way to prove that you got a stale value. The write and read are happening concurrently. – jacobsa Oct 10 '17 at 23:42
since there is no synchronization between threads that call `HandleEvent()` and `PrintStats()`. Declaring `static size_t g_event_count_` will have the same effect, no? – HCSF Oct 05 '19 at 12:24
1

No, that will cause undefined behavior due to a [data race](https://en.cppreference.com/w/cpp/language/memory_model#Threads_and_data_races). The likely outcome based on the code compilers will actually generate is lost increments, but in theory anything could happen. – jacobsa Oct 05 '19 at 22:03
@jacobsa I see you point -- as the coder should respect what the standard mentions but not the specific architecture (e.g. binary produced for x86-64 should be the same for anything <= 64 bit with or without `std::memory_order_relaxed` but it is architecture specific). – HCSF Oct 07 '19 at 03:41
1

@HCSF: if you're writing assembly code then you get to reason about what the architecture does, but not if the compiler is writing assembly for you. [There is no such thing as a benign data race](https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong). And in this case even forgetting the data race, you will still lose increments: the compiler may generate a naive "load, add 1, store", which is not atomic. – jacobsa Oct 07 '19 at 08:11
What if any other thread(s) call `g_event_count_.store(0, std::memory_order_relaxed);` to reset the counter? Would that break the correctness of the code? – plasmacel Dec 21 '21 at 18:21
It would of course reset the counter, but assuming that’s what you wanted to do it would still be correct. Each atomic has an underlying single total order of modifications, so the counter would now represent increments that came after the reset in that total order. There is no issue of data race because the operations remain atomic. – jacobsa Dec 22 '21 at 21:53

gatis paeglis · Answer 2 · 2018-08-27T18:51:20.243

Event reader in this case could be connected to X11 socket, where frequency of events depends from a user actions (resizing window, typing, etc.) And if the GUI thread's event dispatcher is checking for events at regular intervals (e.g. due to some timer events in user application) we don't want to needlessly block event reader thread by acquiring lock on the shared event queue which we know is empty. We can simply check if anything has been queued by using the 'dataReady' atomic. This is also known as "Double-checked locking" pattern.

namespace {
std::mutex mutex;
std::atomic_bool dataReady(false);
std::atomic_bool done(false);
std::deque<int> events; // shared event queue, protected by mutex
}

void eventReaderThread()
{
    static int eventId = 0;
    std::chrono::milliseconds ms(100);
    while (true) {
        std::this_thread::sleep_for(ms);
        mutex.lock();
        eventId++; // populate event queue, e.g from pending messgaes on a socket
        events.push_back(eventId);
        dataReady.store(true, std::memory_order_release);
        mutex.unlock();
        if (eventId == 10) {
            done.store(true, std::memory_order_release);
            break;
        }
    }
}

void guiThread()
{
    while (!done.load(std::memory_order_acquire)) {
        if (dataReady.load(std::memory_order_acquire)) { // Double-checked locking pattern
            mutex.lock();
            std::cout << events.front() << std::endl;
            events.pop_front();
            // If consumer() is called again, and producer() has not added new events yet,
            // we will see the value set via this memory_order_relaxed.
            // If producer() has added new events, we will see that as well due to normal
            // acquire->release.
            // relaxed docs say: "guarantee atomicity and modification order consistency"
            dataReady.store(false, std::memory_order_relaxed);
            mutex.unlock();
        }
    }
}

int main()
{
    std::thread producerThread(eventReaderThread);
    std::thread consumerThread(guiThread);
    producerThread.join();
    consumerThread.join();
}

What are some use cases for memory_order_relaxed

2 Answers2

Linked