0

I have a background thread which loops on a state variable done. When I want to stop the thread I set the variable done to true. But apparently this variable is never set. I understand that the compiler might optimize it away so I have marked done volatile. But that seems not to have any effect. Note, I am not worried about race conditions so I have not made it atomic or used any synchronization constructs. How do I get the thread to not skip testing the variable at every iteration? Or is the problem something else entirely? done is initially false.

struct SomeObject
{
    volatile bool done_;
    void DoRun();
};

static void RunLoop(void* arg)
{
   if (!arg)
      return;

   SomeObject* thiz = static_cast<SomeObject*>(arg);
   while( !(thiz->done_) ) {
      thiz->DoRun();
   }
   return;
}
341008
  • 9,862
  • 11
  • 52
  • 84
  • 3
    Please don't use volatile for multithreaded contexts: http://stackoverflow.com/questions/4557979/when-to-use-volatile-with-multi-threading – Ryp May 20 '14 at 13:32
  • `I am not worried about race conditions` - Maybe you should. Who is setting the done variable to true? – ComicSansMS May 20 '14 at 13:34
  • @Ryp So, basically `volatile` is useless. So, how do I accomplish what I am trying to do? – 341008 May 20 '14 at 13:36
  • @ComicSansMS The main thread. I don't mind if the worker thread iterates a few more times before reading the updated value. This loop has to be free of locks and I want to avoid putting any just for this. – 341008 May 20 '14 at 13:37
  • 2
    std::atomic for c++11 or a compiler specific solution if you don't have the newest standard. – Voo May 20 '14 at 13:37
  • 2
    in C++11, prefer std::atomic, or if not available, use standard mutexes. – Ryp May 20 '14 at 13:38
  • @Ryp But mutexes are for synchronization. I just want `done_` not to be optimized. Are you saying there is not way for doing this except by using synchronization constructs? – 341008 May 20 '14 at 13:40
  • 2
    @341008 Your assumption that you will see the updated value eventually is not true. You have a data race, which means the compiler is perfectly okay with generating code where the variable update is never made visible to the worker thread. I know this seems strange at first, but that's just how multithreading works. – ComicSansMS May 20 '14 at 13:41
  • @341008 `std::atomic` can be and generally is used for lockless algorithms and is what you want. It is *not* synchronized in any higher-level manner. – Jeff May 20 '14 at 13:41
  • @ComicSansMS That is interesting! Can you please elaborate on that or provide a link to some resource I can look up? – 341008 May 20 '14 at 13:43
  • I am not using C++11, so, I will probably have to use platform specific solutions (InterlockedXXX/OSAtomicXXXX) – 341008 May 20 '14 at 13:44
  • 1
    You need memory barriers (look them up). `volatile` (in C++) doesn't give you them, `std::atomic` does. – Kris Vandermotten May 20 '14 at 13:45
  • 1
    in any case, volatile should be removed, and make sure your threads all work with the same instance of the struct. – Ryp May 20 '14 at 13:45
  • @341008 If you are truly interested in what is going on under the hood, take a look at [Herb Sutter's C++ & Beyond talk about atomics](http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2) in C++11. But be warned, this stuff is not for the faint of heart. – ComicSansMS May 20 '14 at 13:49
  • @ComicSansMS would you mind adding an answer? whilst Jeff's says how it should be done, it doesn't say _why_, and I suspect that too few people are aware of the data race condition you raised. I certainly wasn't! – Rook May 20 '14 at 15:01

2 Answers2

3

volatile doesn't have any multi-threaded meaning in C++. It is a holdover from C, used as a modifier for sig_atomic_t flags touched by signal handlers and for access to memory mapped devices. There is no language-mandated compulsion for a C++ function to re-access memory, which leads to the race condition (reader never bothering to check twice as an "optimization") that others note above.

Use std::atomic (from C++11 or newer).

It can be, and usually is lock-free:

struct SomeObject {
  std::atomic_bool done_;
  void DoRun();
  bool IsDone() { return done_.load(); }
  void KillMe() { done_.store(true); }
};

static void RunLoop(void *arg) {
  SomeObject &t = static_cast<SomeObject &>(*arg);

  cout << t.done_.is_lock_free(); // some archaic platforms may be false

  while (!t.IsDone()) {
    t.DoRun();
  }
}

The load() and store() methods force the compiler to, at the least, check the memory location at every iteration. For x86[_64], the cache line for the SomeObject instance's done_ member will be cached and checked locally with no lock or even atomic/locked memory reads as-is. If you were doing something more complicated than a one-way flag set, you'd need to consider using something like explicit memory fences, etc.

Pre-C++11 has no multi-threaded memory model, so you will have to rely on a third-party library with special compiler privileges like pthreads or use compiler-specific functionality to get the equivalent.

Jeff
  • 3,475
  • 4
  • 26
  • 35
  • Shouldn't `atomic done_;` be `std::atomic_bool done_;` ? – oz10 May 20 '14 at 15:11
  • Another comment, I think the load() and store() operations could actually be relaxed in this case. In most cases, you don't need the "stop" flag to be a full memory barrier (as is the default) as long as a change in the value is eventually visible. To this end I think I generally go with load(std::memory_order_relaxed) for checking the sentinel value like done. – oz10 May 20 '14 at 15:21
  • That's a fair point. My primary C++(11) platform is x86_64, where the cache line synchronization is "free", and a compiler fence is the only thing needed at all due to the very strictly ordered memory model. – Jeff May 20 '14 at 15:25
  • it is kind of an advanced topic, probably overkill for this example. – oz10 May 20 '14 at 15:27
  • See n3797 s1.10/24: mentions threading and volatile, but falls short of a guarantee. s7.1.6.2/7 shows that the compiler must perform the load, but says nothing about the store. Wherein the problem lies. – david.pfx May 20 '14 at 15:49
  • @david.pfx where are you looking? I don't see anything in (n3797) `C++11 §7.1.6.2` past `/5`: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3797.pdf – Jeff May 20 '14 at 15:56
  • @Jeff: Sorry, that's s7.1.6.1/7. `value of the object might be changed by means undetectable` implies that the compiler should reload the value, but not that it ever store the value. – david.pfx May 20 '14 at 23:14
1

This works like you expect in msvc 2010. If I remove the volatile it loops forever. If I leave the volatile it works. This is because microsoft treats volatile like you expect, which is different than iso.

This works too:

struct CDone {
    bool m_fDone;
};

int ThreadProc(volatile CDone *pDone) {
}

Here is what MSDN says:

http://msdn.microsoft.com/en-us/library/12a04hfd.aspx

Objects that are declared as volatile are not used in certain optimizations because their values can change at any time. The system always reads the current value of a volatile object when it is requested, even if a previous instruction asked for a value from the same object.

Also, the value of the object is written immediately on assignment.

ISO Compliant:

If you are familiar with the C# volatile keyword, or familiar with the behavior of volatile in earlier versions of Visual C++, be aware that the C++11 ISO Standard volatile keyword is different and is supported in Visual Studio when the /volatile:iso compiler option is specified. (For ARM, it's specified by default). The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication. For inter-thread communication, use mechanisms such as std::atomic from theC++ Standard Template Library.

johnnycrash
  • 5,184
  • 5
  • 34
  • 58