Do locks in C++ 11 guarantee freshness of accessed data?

Question

Usually, when using std::atomic types accessed concurrently by multiple threads, there's no guarantee a thread will read the "up to date" value when accessing them, and a thread may get a stale value from cache or any older value. The only way to get the up to date value are functions such as compare_exchange_XXX. (See questions here and here)

#include <atomic>

std::atomic<int> cancel_work = 0;
std::mutex mutex;

//Thread 1 executes this function
void thread1_func() 
{
    cancel_work.store(1, <some memory order>);
}


// Thread 2 executes this function
void thread2_func() 
{
    //No guarantee tmp will be 1, even when thread1_func is executed first
   int tmp = cancel_work.load(<some memory order>); 
}

However my question is, what happens when using a mutex and lock instead? Do we have any guarantee of the freshness of shared data accessed?

For example, assuming both thread 1 and thread 2 are run concurrently and thread 1 obtains the lock first (executes first). Does it guarantee that thread 2 will see the modified value and not an old value? Does it matter whether the shared data "cancel_work" is atomic or not in this case?

#include <atomic>

int cancel_work = 0;  //any difference if replaced with std::atomic<int> in this case?
std::mutex mutex;

// Thread 1 executes this function
void thread1_func() 
{
    //Assuming Thread 1 enters lock FIRST
    std::lock_guard<std::mutex> lock(mutex);

    cancel_work = 1;
}


// Thread 2 executes this function
void thread2_func() 
{
    std::lock_guard<std::mutex> lock(mutex);

    int tmp = cancel_work; //Will tmp be 1 or 0?
}

int main()
{
   std::thread t1(thread1_func);
   std::thread t2(thread2_func);

   t1.join(); t2.join();

   return 0;
}

Yes, the lock release/acquisition introduces a happens-before relationship, which together with the regular intra-thread sequenced-before ordering ensures that you observe the update *if* `thread2_func` runs after the other function. Here "after" is define by the total ordering on the mutex state. (What you *don't* get is a guarantee that `thread1_func` will ever run.) — Kerrek SB, Oct 11 '18 at 14:26
What do you mean by `//Thread 1 enters lock FIRST`? This is a question about ordering of operations, so words about order need to be really precise. Can you provide a [mcve]? Are we talking `std::thread(thread1_func).join(); std::thread(thread2_func).join();`? That guarantees `thread1_func` happens FIRST. Or `std::thread t1(thread1_func); std::thread t2([]{ std::this_thread::sleep_for(100years); thread2_func();}); t1.join(); t2.join();`, which makes no such guarantee, but someone could naively say "obviously `thread1_func` happens FIRST, I put a 100 year wait there!". — Yakk - Adam Nevraumont, Oct 11 '18 at 15:15
@Yakk-AdamNevraumont I am talking about the latter. I update the question for more clarity. — yggdrasil, Oct 11 '18 at 15:24
You still haven't explained what you mean by `//Thread 1 enters lock FIRST`. There is no ordering between the entering of those two locks. Right now your program is equivalent to the null program, so under as-if the question is meaningless. Assuming you printed `tmp`, under as-if the compiler could eliminate all of your threading code and simply print either `0` or `1`. And only observable behavior is specified by the C++ standard. This isn't just academic; this is key to the problem. The hardware could run thread1 "first" but the compiler is free to treat it as-if thread2 ran "first". — Yakk - Adam Nevraumont, Oct 11 '18 at 15:28
@KerrekSB why don't you just put your comment as an answer? Looks like answer to me. — StahlRat, Oct 11 '18 at 15:32
@Yakk-AdamNevraumont Ignoring the fact that the code does nothing and all possible compiler optimizations, since the variable cancel_work is protected by a lock I believe only two things can happen here: either thread1 enters the critical section first and then thread2 does, or the contrary happens. By "thread1 enters the lock first", I mean I am assuming the first scenario and I want to know what will get stored into "tmp". — yggdrasil, Oct 11 '18 at 15:36
@A.S. You cannot validly ignore "complier optimization" in C++. The naive mapping of C++ instructions to assembly is not some holy "true program meaning" and variations away from it are somehow less true because they are "just optimizations". There is the range of observable behavior that the C++ standard specifies your program does; how that behavior occurs in the machine your program runs on does not further constrain C++ programs. And the C++ threading model is *heavily* about the observable behavior. — Yakk - Adam Nevraumont, Oct 11 '18 at 15:39
@StahlRat: I'd have to look up references and draw pictures and all that and didn't quite have the time... — Kerrek SB, Oct 11 '18 at 16:38
Not sure about C++, but in Java, all of the complex "happens before" rules can be boiled down to one simple rule of thumb: Whatever thread A does before it releases a lock, will be visible to thread B after thread B acquires the same lock. Sadly though, I have no expectation that anything in C++ could be _that_ simple. — Solomon Slow, Oct 11 '18 at 17:21
I believe @KerrekSB comment is the closest answer to this question. If he turns it into an answer I will accept it. — yggdrasil, Oct 12 '18 at 05:28

score 1 · Answer 1 · answered Oct 11 '18 at 14:53

1

Yes, the using of the mutex/lock guarantees that thread2_func() will obtain a modified value.

However, according to the std::atomic specification:

The synchronization is established only between the threads releasing and acquiring the same atomic variable. Other threads can see different order of memory accesses than either or both of the synchronized threads.

So your code will work correctly using acquire/release logic, too.

#include <atomic>

std::atomic<int> cancel_work = 0;

void thread1_func() 
{
    cancel_work.store(1, std::memory_order_release);
}

void thread2_func() 
{
    // tmp will be 1, when thread1_func is executed first
    int tmp = cancel_work.load(std::memory_order_acquire); 
}

answered Oct 11 '18 at 14:53

serge

992
5
8

I see, thanks! But doesn't the second part of the answer contradict the accepted answer here? https://stackoverflow.com/a/14687847/1332171 – yggdrasil Oct 11 '18 at 15:07
@A.S. A problem is that you are using the word "before" and FIRST and the like. What do you mean by "before"? FIRST? – Yakk - Adam Nevraumont Oct 11 '18 at 15:12
There is no contradiction, I'd like just specify that using of some memory orders do the job correctly. So the mutex/lock code is a little bit simpler in my taste. – serge Oct 11 '18 at 15:13
@serge "Some memory models" -- what about the C++ memory model? – Yakk - Adam Nevraumont Oct 11 '18 at 15:13
"Some memory orders", my mistype, sorry – serge Oct 11 '18 at 15:15
By "before" I mean these threads are being run "concurrently" and so may be interleaved but the line cancel_work.store in thread1_func happens to be executed before the line in thread2_func. – yggdrasil Oct 11 '18 at 15:15
The accepted answer I mentioned says stores are not guaranteed to be seen immediately by the other thread, while @serge's answer states thread2_func will see 1. What am I missing? – yggdrasil Oct 11 '18 at 15:18
After reading the dedicated chapter in "C++ concurrency in action" book to get some clarifications, I can state quite confidently that although the answer about locks is true, the second part of the answer is wrong. As I suspected, even when using release/acquire memory order and the thread1_func manages to execute before thread2_func, there is absolutely no guarantee thread2_func will see the value written by thread1_func. This is because release/acquire synchronize if and only if the acquire reads the value written by the store (release), which is not guaranteed but will happen "eventually" – yggdrasil Oct 12 '18 at 05:04
Specifically, since thread2_func is not looping the load but acquires the value only once, both 1 and 0 are possible. – yggdrasil Oct 12 '18 at 05:31
@A.S. are you sure? The C++ specification contains the [classic example of producer-consumer](https://en.cppreference.com/w/cpp/atomic/memory_order), see "Release-Acquire ordering" chapter – serge Oct 12 '18 at 07:52
1

@serge Yes, the example in the C++ specification uses a loop in the consumer thread. So when it "eventually" reads the value stored by the producer thread, the aquire/release synchronization kicks in and a happens-before relation is enstablished. The consumer thread is not guaranteed to read the up-to-date value the producer already wrote, but the standard states it will happen in a "reasonable amount of time". (See also the answer here: https://stackoverflow.com/a/6681505/1332171) – yggdrasil Oct 12 '18 at 09:22
@A.S. ok, I see your point. As a "reasonable amount of time" is not defined we can consider atomic operation as "non-waiting" in common case. – serge Oct 12 '18 at 13:45
@serge Yes, exactly. – yggdrasil Oct 12 '18 at 14:03

Yakk - Adam Nevraumont · Answer 2 · 2018-10-11T15:50:16.060

The C++ standard only constrains the observable behavior of the abstract machine in well formed programs without undefined behavior anywhere during the abstract machine's execution.

It provides no guarantees about mapping between the physical hardware actions the program executes and behavior.

In your cases, on the abstract machine there is no ordering between thread1 and thread2's execution. Even if the physical hardware where to schedule and run thread1 before thread2, that places zero constraints (in your simple example) on the output the program generates. The programs' output is only contrained by what legal outputs the abstract machine could produce.

A C++ compiler can legally:

Eliminate your program completely as equivalent to return 0;
Prove that the read of cancel_work in thread2 is unsequenced relative to all modification of cancel_work away from 0, and change it to a constant read of 0.
Actually run thread1 first then run thread2, but prove it can treat the operations in thread2 as-if they occurred before thread1 ran, so don't bother forcing a cache line refresh in thread2 and reading stale data from cancel_work.

What actually happens on the hardware does not impact what the program can legally do. And what the program can legally do is in threading sitations is restricted by observable behavior of the abstract machine, and on the behavior of synchronization primitives and their use in different threads.

For an actual happens before relationship to occur, you need something like:

std::thread(thread1_func).join();
std::thread(thread2_func).join();

and now we do know that everything in thread1_func happens before thread2_func.

We can still rewrite your program as return 0; and similar changes. But we now have a guarantee that thread1_func happens before thread2_func code does.

Note that we can eliminate (1) above via:

std::lock_guard<std::mutex> lock(mutex);

int tmp = cancel_work; //Will tmp be 1 or 0?
std::cout << tmp;

and cause tmp to actually be printed.

The program can then be converted to one that prints 1 or 0 and has no threading at all. It could keep the threading, but change thread2_func to print a constant 0. Etc.

So we rewrite your program to look like this:

std::condition_variable cv;
bool writ = false;
int cancel_work = 0;  //any difference if replaced with std::atomic<int> in this case?
std::mutex mutex;

// Thread 1 executes this function
void thread1_func() 
{
    {
      std::lock_guard<std::mutex> lock(mutex);

      cancel_work = 1;
    }
    {
      std::lock_guard<std::mutex> lock(mutex);
      writ = true;
      cv.notify_all();
    }
}


// Thread 2 executes this function
void thread2_func() 
{
    std::unique_lock<std::mutex> lock(mutex);

    cv.wait(lock, []{ return writ; } );

    int tmp = cancel_work;
    std::cout << tmp; // will print 1
}

int main()
{
   std::thread t1(thread1_func);
   std::thread t2(thread2_func);

   t1.join(); t2.join();

   return 0;
}

and now thread2_func happens after thread1_func and all is good. The read is guaranteed to be 1.

Do locks in C++ 11 guarantee freshness of accessed data?

2 Answers2