35

As a result of my answer to this question, I started reading about the keyword volatile and what the consensus is regarding it. I see there is a lot of information about it, some old which seems wrong now and a lot new which says it has almost no place in multi-threaded programming. Hence, I'd like to clarify a specific usage (couldn't find an exact answer here on SO).

I also want to point out I do understand the requirements for writing multi-threaded code in general and why volatile is not solving things. Still, I see code using volatile for thread control in code bases I work in. Further, this is the only case I use the volatile keyword as all other shared resources are properly synchronized.

Say we have a class like:

class SomeWorker
{
public:
    SomeWorker() : isRunning_(false) {}
    void start() { isRunning_ = true; /* spawns thread and calls run */ }
    void stop() { isRunning_ = false; }

private:
    void run()
    {
        while (isRunning_)
        {
            // do something
        }
    }
    volatile bool isRunning_;
};

For simplicity some things are left out, but the essential thing is that an object is created which does something in a newly spawned thread checking a (volatile) boolean to know if it should stop. This boolean value is set from another thread whenever it wants the worker to stop.

My understanding has been that the reason to use volatile in this specific case is simply to avoid any optimization which would cache it in a register for the loop. Hence, resulting in an infinite loop. There is no need to properly synchronize things, because the worker thread will eventually get the new value?

I'd like to understand if this is considered completely wrong and if the right approach is to use a synchronized variable? Is there a difference between compiler/architecture/cores? Maybe it's just a sloppy approach worth avoiding?

I'd be happy if someone would clarify this. Thanks!

EDIT

I'd be interested to see (in code) how you choose to solve this.

Community
  • 1
  • 1
murrekatt
  • 5,961
  • 5
  • 39
  • 63
  • not according to what I've read lately and not according to the discussion around the question I linked to. To me it seems things have changed at some point and this way to approach things is no longer considered a good way. I'd like to get a confirmation and explanation to this. :) – murrekatt Aug 09 '11 at 11:28
  • Even ignoring volatile, the code above has a race condition. Calling `stop()` then `start()` in quick succession may result in more than one thread running at the same time. Whether that's a bug or not is a design question. – Marcelo Cantos Aug 09 '11 at 11:42
  • yes, but as you stated this can be a design question. if not, see my comment for this case at @eran's answer. – Karoly Horvath Aug 09 '11 at 11:47
  • 3
    It is fine in this very specific case. Don't go jumping to conclusions from it, *volatile* is not a substitute for an event, nor is it suitable for implementing locks. – Hans Passant Aug 09 '11 at 11:51
  • @Hans: are you saying the use of volatile bool is alright in this specific case? – murrekatt Aug 09 '11 at 11:54
  • Yes, that's what "it is fine in this very specific case" means. – Hans Passant Aug 09 '11 at 11:55
  • @Hans: too many comments in between that I was unsure what you answer to. Also I'm surprised that you say this while so many other say the opposite. Just look at the question I link to. – murrekatt Aug 09 '11 at 11:57
  • 1
    @murrekatt: I suppose the negative feedback is because it's just bad practice. You might get away with it in this case, but what if you want to add an `int` or a pointer? Suddenly you're in trouble. If you stick to atomics as a matter of course, you'll be in the right concurrent mindset from the start. – Kerrek SB Aug 09 '11 at 12:03
  • 1
    There's never a lack of FUD when it comes to *volatile*. Best mentioned in comments, not answers. The question you linked requires additional synchronization to ensure that the thread has exited. – Hans Passant Aug 09 '11 at 12:04
  • See my response where I explain [what `volatile` does](http://stackoverflow.com/questions/6866206/volatile-and-createthread/6866927#6866927) and with examples for exactly this situation: – Cory Nelson Aug 09 '11 at 12:13
  • 2
    @Hans Passant I may work in this specific case, or it may not. It's not guaranteed with most compilers and most hardware. I've not seen any that explicitly guarantee it, but if you know what the compiler does with volatile, and you know what the hardware does with what the compiler does, you might be able to derive a guarantee. On the other hand, there are a lot of systems where it most specifically won't work. (Sparc with either Sun CC or g++, for example. And I'm not too sure about Intel/AMD with VC++ or g++.) – James Kanze Aug 09 '11 at 12:17
  • @JamesKanze: All real-world C++ implementations have inter-thread visibility for `volatile` in practice, because accesses are required to compile to a load or store in the asm (which is sufficient [because CPUs have coherent caches between the cores that std::thread starts threads across](https://stackoverflow.com/a/58535118)). You don't get any ordering, but this code doesn't depend on that (and wouldn't give any useful sync with SC atomics). The well-defined way to do this would be `std::atomic` with `memory_order_relaxed`. `volatile` is obsolete for this but does work fine. – Peter Cordes Apr 21 '23 at 11:37
  • @JamesKanze: I assume you don't remember your point about how Sun CC or g++ would compile this for SPARC, but I'm certain g++ at least would compile it to the same memory operations as `std::atomic` with `memory_order_relaxed`. (Just a load or store, no barriers). It could only break if a compiler disregarded `volatile` and hoisted a load out of the loop, not actually re-checking it every iteration. The reader will definitely see a value stored by another core after a few tens of nanoseconds, thanks to cache coherency. – Peter Cordes Apr 21 '23 at 11:42

6 Answers6

12

You don't need a synchronized variable, but rather an atomic variable. Luckily, you can just use std::atomic<bool>.

The key issue is that if more than one thread accesses the same memory simultaneously, then unless the access is atomic, your entire program ceases to be in a well-defined state. Perhaps you're lucky with a bool, which is possibly getting updated atomically in any case, but the only way to be offensively certain that you're doing it right is to use atomic variables.

"Seeing codebases you work in" is probably not a very good measure when it comes to learning concurrent programming. Concurrent programming is fiendishly difficult and very few people understand it fully, and I'm willing to bet that the vast majority of homebrew code (i.e. not using dedicated concurrent libraries throughout) is incorrect in some way. The problem is that those errors may be extremely hard to observe or reproduce, so you might never know.

Edit: You aren't saying in your question how the bool is getting updated, so I am assuming the worst. If you wrap your entire update operation in a global lock, for instance, then of course there's no concurrent memory access.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • AFAIK you don't need `atomic` for this specific code/example. let me know if I'm wrong. if so, on which platform does this fail? – Karoly Horvath Aug 09 '11 at 11:31
  • I do not know if I misunderstood you but in my opinion std::atomic synchronizes(by mutual exclusion) the value it holds. – Nobody moving away from SE Aug 09 '11 at 11:31
  • Thanks for your answer. I just mentioned that I see this in code bases I work in, but I also see this online in articles and forums to be the suggested way to do things. This is what I find strange and would like to get an explanation to. – murrekatt Aug 09 '11 at 11:32
  • Concurrent programming isn't intrinsically very difficult. It's very difficult when using traditional exclusion and signalling primitives to coordinate access to shared memory. Using message queues to pass around immutable objects is much safer and easier to reason about than the traditional models. It's still not easy, but the difference is like night and day. – Marcelo Cantos Aug 09 '11 at 11:36
  • @Marcelo: your point makes sense if two threads try to write to the same memory address. here is just one updater and one reader and the reader probably doesn't care if it doesn't see the update instantly. – Karoly Horvath Aug 09 '11 at 11:49
  • @Nobody: atomics are a bit different -- their access is synchronized at the instruction level. There's no visible mutex at the language level. If you use mutex locks, then there is *no* concurrent access, because the mutex serializes the access. *But*, if you think about how to *implement* a mutex, you will see that you actually need atomics! – Kerrek SB Aug 09 '11 at 11:49
  • @yi_H: Yes, but I'm guessing that it does care not to miss a sequence of updates. Assuming you only want at most one thread to run at any given time, the problem isn't that simple. You can solve this using conventional primitives, but a queue is versatile enough to solve this and practically any other kind of concurrency problem. – Marcelo Cantos Aug 09 '11 at 11:57
  • @Marcello: Fair enough, but you still have two big problems: a) *write* such a queue. Your library might do that for you. And b) this is fairly restrictive and you need to incorporate the message queue deep into your design, and make sure you're not cutting corners. I'm sure it can be done, but I imagine that you still have to be very alert, and nobody will tell you when you're doing something wrong. – Kerrek SB Aug 09 '11 at 12:00
  • @Kerrek: I know about atomics, but in my opinion the usual architectures only provide atomic operations on bitlevel. The std::atomic is a container for anything and if you look in the interface you will see `bool is_lock_free()` that in my opinion shows that internally this container wraps a mutex around the internal structure to make the operations atomic. – Nobody moving away from SE Aug 09 '11 at 12:00
  • @Kerrek: I say in the question that another thread will call stop when it wants (no other locking). – murrekatt Aug 09 '11 at 12:01
  • 2
    @Nobody: Most common architectures guarantee atomicity for aligned word sized loads and stores; some, such as x86 provide additional atomic primitives for incrementing etc. std::atomic is designed to allow an implementation to use the hw provided atomics for some types (e.g. std::atomic) and a wrapper with a mutex for more complicated types. – janneb Aug 09 '11 at 12:20
  • 1
    @janneb: Thanks for pointing that out. So I might say that `std::atomic` can be really atomic while `std::atomic` for example will be mutex wrapped. – Nobody moving away from SE Aug 09 '11 at 12:24
11

volatile can be used for such purposes. However this is an extension to standard C++ by Microsoft:

Microsoft Specific

Objects declared as volatile are (...)

  • A write to a volatile object (volatile write) has Release semantics; (...)
  • A read of a volatile object (volatile read) has Acquire semantics; (...)

This allows volatile objects to be used for memory locks and releases in multithreaded applications.(emph. added)

That is, as far as I understand, when you use the Visual C++ compiler, a volatile bool is for most practical purposes an atomic<bool>.

It should be noted that newer VS versions add a /volatile switch that controls this behavior, so this only holds if /volatile:ms is active.

Community
  • 1
  • 1
Martin Ba
  • 37,187
  • 33
  • 183
  • 337
  • Seems like this extension makes the C++ volatile keyword behave more similar to Java's volatile. In Java, volatile does guarantee order of access, which might explain some programmer's confusion about its function in C++. – ddso Aug 09 '11 at 11:43
  • 3
    That MSDN article is very unfortunate, it is dead wrong. You can't implement a lock with *volatile*, not even with Microsoft's version. The description is pretty irrelevant too, odds you'll run your code on an Itanium are slim these days. – Hans Passant Aug 09 '11 at 11:46
  • @HansPassant: I have started a separate question to clear this up: http://stackoverflow.com/questions/7007403/are-volatile-reads-and-writes-atomic-on-windowsvisualc – Martin Ba Aug 10 '11 at 07:43
  • @Hans wrote "You can't implement a *lock* with volatile, not even with Microsoft's version." - this is true. But there's no *lock* in the use case of this question. – Martin Ba Aug 10 '11 at 08:10
  • It says: "_a reference to **a global or static object** that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary_" Doesn't sound like Java `volatile`. – curiousguy Oct 23 '11 at 20:42
  • 1
    `/volatile:ms` is a bit on the deprecated side, for example doesn't hold for ARM targets. Certainly not compliant. – Mikhail Jul 08 '14 at 03:30
  • 1
    You don't need MS's acq_rel semantics for `volatile` for a "stop now" / "keep running" flag; `atomic` with `std::memory_order_relaxed` would be sufficient, and in practice `volatile` gives you something similar to that in practice on real implementations (like GCC/clang as well as MSVC `/volatile:iso`). See [When to use volatile with multi threading?](//stackoverflow.com/a/58535118) for an explanation of why: real C++ implementations run threads on CPUs that have coherent caches. `volatile` was the de-facto standard before C++11, and still works in practice. (Don't use it, though!) – Peter Cordes Apr 18 '23 at 03:55
  • *when you use the Visual C++ compiler, a `volatile bool` is for most practical purposes an `atomic`* - the differences are that `atomic` defaults to seq_cst, vs. MS `bool` giving acq_rel (and `/volatile:iso` being `relaxed`). So `atomic` is slower if you don't explicitly use `flag.store(false, std::memory_order_release)`. Also `atomic` makes operations like `flag ^= 1` into an atomic RMW, if that operation is supported for bool. – Peter Cordes Apr 18 '23 at 03:58
7

Using volatile is enough only on single cores, where all threads use the same cache. On multi-cores, if stop() is called on one core and run() is executing on another, it might take some time for the CPU caches to synchronize, which means two cores might see two different views of isRunning_. This means run() will run for a while after it has been stopped.

If you use synchronization mechanisms, they will ensure all caches get the same values, in the price of stalling the program for a while. Whether performance or correctness is more important to you depends on your actual needs.

Eran
  • 21,632
  • 6
  • 56
  • 89
  • 1
    well, in that loop you tipically do a lot of things otherwise it will just eat your CPU... so cache syncrhonization shouldn't be an issue. Also for an example like this you don't call stop() and run() simultaneously. – Karoly Horvath Aug 09 '11 at 11:35
  • 1
    This... It's suprising to many programmers that changes they make to a memory location in one thread may not be visible to another thread reading the same memory location, and that changes to several memory locations may be seen out of order. There are special memory barrier instructions to synchronise things. But it's _very_ much easier to use c++ atomic types, or synchronization functions from a library, or things like InterlockedIncrement (on windows) rather than try to get it right yourself – jcoder Aug 09 '11 at 11:36
  • Thanks eran, yes, this is how I concluded it after reading a bit answers to "volatile" questions here on SO. You mention performance, isn't it slower to use volatile or maybe std::atomic or alike is as slow? – murrekatt Aug 09 '11 at 11:36
  • @yi_H, multithreading is so hard because of all those edge cases... If the `stop()` is called by a button event, you're right. Human actions aren't fast enough anyway. But for the general case, cache synchronization is at least an issue to have in mind. – Eran Aug 09 '11 at 11:40
  • @eran: I don't see the point. if `run()` does CPU intensive processing or blocking I/O in that loop then doing `stop(); .. start();` doesn't make sense. not even with atomic. Your thread might never see that transition. you need another variable to signal that the thread recieved the `stop()` so you can call the next `start()`. (yes I know, there are MT primitives to do this, but again, do you know a platform where this doesn't work?) – Karoly Horvath Aug 09 '11 at 11:43
  • @murrekatt, volatile might slightly hit performance, otherwise the compiler would never have bothered to use a register. But that's call _premature optimization_. On most cases, you'll never feel the hit. Real synchronizations will cause a more noticeable hit, but how much depends on the mechanism used, the architecture and the rest of your code. You'll just have to test it. – Eran Aug 09 '11 at 11:46
  • @yi_H, I wan't referring to the stop and run simultaneous call scenario... In my answer, I refer to the case where `run` is executing the loop on one core for some time, and then `stop` is called on another core. In this case, it might take some time for the `run` thread to see the change. Solving more complicated cases like multiple starts and stops require more complicated mechanisms than the given code, no question about that. – Eran Aug 09 '11 at 11:57
  • 2
    Cache synchronization isn't the issue (or at least not the only one). The read and write queues on the processor can be a more fundamental issue; for example, if the processor finds the value it wants in the read queue, it generally won't attempt to go to memory again. For code like the above to work, you need to ensure that the memory is synchronized, using some sort of barrier or fence machine instruction. Most compilers don't generate this for volatile, so volatile doesn't suffice here. – James Kanze Aug 09 '11 at 12:21
  • (note: waiting a little bit for cache synchronization is *fine*) – Karoly Horvath Aug 09 '11 at 12:47
  • @yi_H Sun CC or g++ on a Sparc. I'm not sure about Intel systems; I've seen contradictory statements of what the hardware guarantees. – James Kanze Aug 09 '11 at 14:05
  • @eran I have posted a related question to this proposed answer at http://stackoverflow.com/q/30958375/2369597 and I'd be really grateful if you could spare a few moments to post on it. Thank you. – Wad Jun 21 '15 at 12:46
  • 1
    @JamesKanze: If a variable is used solely for cancellation, which will be rare, and if cancellation didn't need to occur with any particular timeliness, could one avoid synchronization overhead in the non-cancellation case by having code that simply ensuring that every thread will get hit with a context switch at least occasionally? – supercat Jul 16 '15 at 18:07
6

There are three major problems you are facing when multithreading:

1) Synchronization and thread safety. Variables that are shared between several threads must be protected from being written to by several threads at once, and prevented from being read during non-atomic writes. Synchronization of objects can only be done through a special semaphore/mutex object which is guaranteed to be atomic by itself. The volatile keyword does not help.

2) Instruction piping. A CPU can change the order in which some instructions are executed to make code run faster. In a multi-CPU environment where one thread is executed per CPU, the CPUs pipe instructions without knowing that another CPU in the system is doing the same. Protection against instruction piping is called memory barriers. It is all explained well at Wikipedia. Memory barriers may be implemented either through dedicated memory barrier objects or through the semaphore/mutex object in the system. A compiler could possibly chose to invoke a memory barrier in the code when the volatile keyword is used, but that would be rather special exception and not the norm. I would never assume that the volatile keyword did this without having it verified in the compiler manual.

3) Compiler unawareness of callback functions. Just as for hardware interrupts, some compilers may not know that an callback function has been executed and updated a value in the middle of code execution. You can have code like this:

// main
x=true;
while(something) 
{   
  if(x==true)   
  {
    do_something();
  }
  else
  {
    do_seomthing_else();
    /* The code may never go here: the compiler doesn't realize that x 
       was changed by the callback. Or worse, the compiler's optimizer 
       could decide to entirely remove this section from the program, as
       it thinks that x could never be false when the program comes here. */
  } 
}

// thread callback function:
void thread (void)
{
  x=false;
}

Note that this problem only appears on some compilers, depending on their optimizer settings. This particular problem is solved by the volatile keyword.


So the answer to the question is: in a multi-threaded program, the volatile keyword does not help with thread synchronization/safety, it does likely not act as a memory barrier, but it could prevent against dangerous assumptions by the compiler's optimizer.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Thanks Lundin for your answer. I still don't hear a consensus if volatile bool is appropriate code in my example. – murrekatt Aug 09 '11 at 12:38
  • 1
    @murrekatt I believe you have gotten it right when you wrote the question, the volatile in that case is just to protect again optimizer goof-ups, just as in 3) in my answer above. The intent in your specific example does not seem to be thread synchronization, which volatile wouldn't help with. – Lundin Aug 09 '11 at 13:11
  • Still there are claims that this might not work. I'm waiting for some kind of consensus what actually works and is appropriate in a case like this. At the moment I don't have much more clarity than before I asked the question. :( – murrekatt Aug 09 '11 at 13:18
  • 1
    @murrekatt I don't think anyone is saying that volatile can be used as a mutex/semaphore, which was what you asked. As have been pointed out by several, it can be used as a memory barrier in some cases, though such code is compiler dependant and non-portable. Someone said it can't be used as an event, but that's just a remark about CPU usage efficiency: it won't cause bugs just bad performance. Regarding the question you linked to, and you answer, you got some unfair response from that... you never did say it should be volatile for sync purposes. It _should_ be volatile for my reason 3) above. – Lundin Aug 09 '11 at 19:04
  • @curiousguy Historically, all of them, when optimizations are on. Nowadays PC compilers in particular do an ok job of realizing that callbacks are called by someone else other than themselves - embedded systems compilers less so. It's all about the compiler's ability to treat callbacks/interrupts as a special case. – Lundin Jan 31 '20 at 07:24
  • @Lundin Can you name of a few of these broken compilers? – curiousguy Jan 31 '20 at 16:50
  • How does any of this apply for `volatile bool x`? Part of the point of `volatile` is that the value you read might not be the same as a value this thread stored earlier. If a compiler does constant-propagation through a `volatile`, it's severely broken for pre-C++11 hand-rolled atomics. [Who's afraid of a big bad optimizing compiler?](https://lwn.net/Articles/793253/) on LWN describes the Linux kernel's use of `volatile` to avoid problems like that as it rolls its own atomics. (On GCC and clang, not supporting other compilers, to be fair.) – Peter Cordes Apr 18 '23 at 04:02
0

This will work for your case but to protect a critical section this approach is wrong. If it were right then one could use a volatile bool in almost all cases where a mutex is used. The reason for it is that a volatile variable does not guarantee enforcing any memory barriers nor any cache coherence mechanism. On the contrary, a mutex does. In other words once a mutex is locked a cache invalidation is broadcast to all cores in order to maintain consistency among all cores while. With volatile this is not the case. Nevertheless, Andrei Alexandrescu proposed a very interesting approach to use volatile to enforce synchronization on a shared object. And as you'll see he does it with a mutex; volatile is only used to prevent accessing the object's interface without synchronization.

PoP
  • 2,055
  • 3
  • 16
  • 20
0

I think there is nothing wrong with this code and it works fine. However, as you said, this way of writing is no longer recommended, because the efficiency is not high and the maintainability is not high. If you can already use ATOMIC, just give up this way of writing reason:

  1. The one-to-one refresh of the CPU cache will not be immediately synchronized to the multicore.
  2. When you feel that this code is running normally, you will get carried away, add other field variables, and the CPU can execute out of order, so it is possible that your newly added other variables are still a null pointer, when your volatile BOOL value is true.

enter image description here

Jishan Shaikh
  • 1,572
  • 2
  • 13
  • 31
  • `std::atomic flag` with `flag.store(true, std::memory_order_relaxed)` [will compile to the same asm](https://stackoverflow.com/a/58535118) as `volatile bool flag` with `flag = true;`, across all ISAs. With a "stop now / keep running" flag, there's usually no point in having the writer thread stall later memory operations until the store is globally visible, which is what you get from the default `seq_cst` memory order. (CPU cache will never be "immediately" refreshed, but seq_cst doesn't make it any faster, it just makes this CPU wait.) – Peter Cordes Apr 18 '23 at 04:06
  • I wouldn't recommend `volatile bool` for this, but it's actually fine on all mainstream compilers for all ISAs as a kind of equivalent to `relaxed` atomics, because all real C++ implementations run `std::thread` across cores with cache-coherent shared memory. – Peter Cordes Apr 18 '23 at 04:08