Safe to use volatile bool to force another thread to wait? (C++)

Question

Everything I've read about volatile says it's never safe, but I still feel inclined to try it, and I haven't seen this specific scenario declared unsafe.

I have a separate thread that renders a scene, pulling data from the main simulation thread. This has no synchronization, and works fine.

The issue is that when the program exits, then renderer needs to stop pulling data from the simulation thread before the simulation thread can safely clean itself up without causing the renderer to attempt reading invalid memory.

To accomplish this, I have the renderer run infinitely in its thread:

volatile bool stillRendering;

void RenderThreadFunction()
{
    stillRendering = true;

    while(programRunning)
    {
        renderer->render();
    }

    stillRendering = false;
}

In the main program thread, when the windproc quit message is received, I do:

void OnQuit()
{
    programRunning = false;
    while(stillRendering)
    {
    }

    delete application;
}

The goal of this is to be sure the renderer stops pulling data from the application before calling delete on the application.

I first tried this without any volatile keywords, and it worked in debug mode, but in release mode it hung. I assume the compiler made some optimization that causes the program to stop checking the value of stillRendering.

Adding volatile to just stillRendering caused the application to successfully exit everytime I've tested it so far. I'm not certain why it doesn't seem to matter if "programRunning" is volatile.

Lastly, I am uncertain how the performance of the program will be impacted by using volatile for "stillRendering". It doesn't matter to me if making stillRendering volatile affects the performance of OnQuit(), but it does matter to me if it affects the performance of RenderThreadFunction()

@SteveTownsend "confirmed this" what are you referring to? I asked this question because I have no proof of the conclusions I drew when testing it. — Matt Swarthout, Feb 25 '13 at 18:55
This previous thread I think covers the use of volatile and why you should not use it in multithreaded programs: http://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming — Shafik Yaghmour, Feb 25 '13 at 18:57
If you want to drive a screw you use a screwdriver, not a hammer. So why not use the right tool in this instance too? There are events and mutexes and many other, wonderful tools. — Nik Bougalis, Feb 25 '13 at 18:58
@NikBougalis The purpose of this is to be able to quit the program without crashing. The program only quits once per run, but the render loop can run 100's of times per second. I don't think it's worth putting a mutex in a main loop just for that. I want the solution to be as simple as possible, with no overhead. — Matt Swarthout, Feb 25 '13 at 19:06
@MattSwarthout and what is the problem with having `OnQuit` set a variable that instructs the render thread to exit and then calling `WaitForSingleObject` (or some other platform-specific equivalent) on the render thread? — Nik Bougalis, Feb 25 '13 at 19:07
@NikBougalis I don't know how to do any of that without looking it up. It also necessitates the OnQuit() function having some knowledge about how the renderer thread is set up, as well as a specific thread library. Lastly, it seems to require a lock on the condition variable each iteration of the render loop to see if another thread sent an exit signal. I don't want that overhead if it's avoidable. The solution I asked in the question is only a couple lines of code and requires no knowledge of what thread library I am using. If it works, I want to use it. — Matt Swarthout, Feb 25 '13 at 19:22
The mutex will normally be uncontested. Aquiring an uncontested mutex is about as fast as anything else that would work. (Under Windows, use the misnamed `CriticalSection` for your mutex.) — James Kanze, Feb 25 '13 at 19:31

score 8 · Accepted Answer · answered Feb 25 '13 at 19:28

It's completely unsafe, although it might work with some compilers. Basically, volatile only affects the variable it's attached to, so RendererThreadFunction, for example, could set stillRendering false before having finished renderer->render();. (This is true even if both stillRendering and programRunning were both volatile.) The probablility of a problem is very small, so testing probably won't reveal it. And finally, some versions of VC++ do give volatile the semantics of an atomic access under C++11, in which case, your code will work. (Until you compile with a different version of VC++, of course.)

Given that renderer->render() almost certainly takes a non-negligible amount of time, there's absolutely no reason for not using a conditional variable here. About the only time you'd use volatile for this sort of thing is if the shutdown mechanism were triggered by a signal (in which case, the type would be sig_atomic_t, and not bool, although in practice, it probably doesn't make any difference). In that case, there wouldn't be two threads, but just the renderer thread and a signal handler.

Cory Nelson · Answer 2 · 2013-02-25T19:43:23.367

6

If you want your code to work on all architectures in all compilers, use C++11 atomics:

std::atomic<bool> stillRendering;

void RenderThreadFunction()
{
    stillRendering = true;

    while(programRunning)
    {
        renderer->render();
    }

    stillRendering = false;
}

Volatile is not designed for use with multi-threading -- compilers are actually allowed by the standard to reorder volatile accesses with non-volatile accesses. VC++ extends volatile's feature set to prevent reordering, but other compilers do not, and it might break on those compilers.

As others have mentioned, volatile also doesn't affect visibility, meaning architectures which aren't cache-coherent may never see the flag set. x86 isn't even immediately cache-coherent (writes would be extremely slow), so your program will invariably end up looping more than it should while the write is sent through various buffers.

C++11 atomics avoid both of these problems.

OK, so this was mainly meant to correct your current code and warn you off of misusing volatile. James' suggestion of using a condition variable (which is merely a more efficient version of what you're doing) is probably the best actual solution for you.

edited Feb 25 '13 at 19:43

answered Feb 25 '13 at 19:24

Cory Nelson

29,236
5
72
110

Are you saying that a compiler can move a non-volatile operation around even if it's adjacent to a volatile operation? If you move "render()" relative to "stillRendering = false", how is that different than moving "stillRendering = false" relative to "render()"? Am I misunderstanding you? – Matt Swarthout Feb 25 '13 at 19:36
Two points: C++11 atomics will only work with compilers which support C++11, and there aren't that many. (On the other hand, a certain number, like VC++ or g++, which don't fully support C++11, may support atomic access.) Second, it's worth noting that Microsoft has backed down on its position with regards to volatile. Whether volatile has the extended semantics in VC++11 depends on a compile line option, for example. – James Kanze Feb 25 '13 at 19:36
2

@MattSwarthout You've understood him perfectly. As long as there are no `volatile` accesses or IO in the loop, the compiler _could_ legally move `stillRendering = false` to before the loop. (In practice, none do, but they don't usually emit any instructions which would guarantee that the hardware doesn't do such reordering, either.) – James Kanze Feb 25 '13 at 19:38
2

Indeed. `volatile` only prevents reordering _with other volatiles_. – Cory Nelson Feb 25 '13 at 19:39
Thanks for the clarification @James Kanze and Cory Nelson. Unfortunately, it's not what I _wanted_ to be true. – Matt Swarthout Feb 25 '13 at 19:56
That is a good use of `std::atomic`, but I disagree that C++11 "works on all architectures in all compilers" – Drew Dormann Feb 25 '13 at 20:00
@DrewDormann I know of at least two compilers which don't support C++11: VC++11 and g++4.6.x. (They're the two compilers I have to use.) Both support _some_ features of C++, but which ones varies. – James Kanze Feb 26 '13 at 09:09

Pete Becker · Answer 3 · 2013-02-25T20:59:01.243

There are three issues that C++11 atomics address.

First, a thread switch can occur in the middle of reading or writing a value; for reads, another thread could update the value before the original thread reads the rest of the value; for writing, another thread could see the half-written value. This is known as "tearing".

Second, in a typical multi-processor system, each processor has its own cache, and it reads and writes values to that cache; sometimes the cache and main memory are updated to ensure that they hold the same values, but until a processor that writes a new value flushes its cache and a thread that reads the value reloads its copy from the cache the value can be different. This is referred to as "cache coherency".

Third, the compiler can move code around, and store one value before it stores another, even if the code is written in the opposite order. As long as you can't write a valid program that can see the difference, that's okay under the "as if" rule.

Loading from and storing to an atomic variable (with the default memory-ordering) prevents all three problems. Marking a variable as volatile does not.

EDIT: don't bother figuring out which architectures pose which problems. The author of the standard library has already done this for the architecture that the library implementation is intended for. Don't look for shortcuts; just use atomics. You won't lose anything.

Do I truly lose nothing from using a special structure inside the render loop to check for the other thread's quit message? The render loop can run 100s or 1000s of times per second, and it's important that it runs as fast as possible. — Matt Swarthout, Feb 25 '13 at 21:06
@MattSwarthout - as always, the answer is to measure it. But, in general, a properly implemented atomic variable will only impose the overhead that's needed to ensure correct behavior on the target platform. — Pete Becker, Feb 26 '13 at 13:16

score 2 · Answer 4 · answered Feb 25 '13 at 18:53

2

Depending on the architecture being cache-coherent (e.g. x86 processors) I expect this will work just fine. You may find that either of your two threads may run for an iteration more than if you use true atomic operations, but since only one side is setting vs. reading the values, there is no requirement for true atomic operations.

However, if the processor (cores) that are executing the code require specific cache-flushing to make the "other core(s)" see the updated value, then you may be stuck for some time - and you would need proper atomic updates to ensure the other processor's cache is invalidated.

I'm assuming that renderer->render() takes a fair amount of time, so reading the stillRendering shouldn't affect the overall runtime much at all. volatile typically just means "please don't put this in a register and keep it there".

(You probably need programRunning to be a volatile too!)

answered Feb 25 '13 at 18:53

Mats Petersson

126,704
14
140
227

Are you saying that even if I make both 'bool's 'volatile', there's no it could take a non-deterministic amount of time for the other thread to load the most recent value, despite it checking over and over again in the loop? – Matt Swarthout Feb 25 '13 at 19:01
Yes, if the processor is not cache-coherent, it could literally take "forever" to get the value updated - in practice, probably not more than a few tenths of a second, but that's a guess, not certain! But on x86, it will be checked every iteration of the loop, and whilst there is a race between the two threads setting and checking the variable (that is, the value may be set to `true` just after it was checked to be `false` or vice versa), the "reading" thread will see the new value on the next read. – Mats Petersson Feb 25 '13 at 19:08
I understand what you are saying about the possibility of it not reading the most recent value, but I don't see how there can be a get / set race condition in this case because only one thread has the ability the set, and the other thread can only wait for the set. Once the first set happens, the roles reverse, but I don't see how it could break (provided the correct value is EVENTUALLY read -- since this happens at program exit, it's not necessary it happens instantly). – Matt Swarthout Feb 25 '13 at 19:16
Yes, agreed. The only REAL problem is if you run this on some processor that doesn't have cache-coherency - in particular multi-core ARM processors for mobile phones tend to have this problem. Then `volatile` isn't enough to ensure a quick update of the variable that has changed. But if you are not planning to make your app run on a mobile phone, don't worry about that. – Mats Petersson Feb 25 '13 at 19:20
Could you go into a little more detail about the problems on multi-core ARM processors? The application is designed to compile to Win32 Desktop (x86, x64) and ARM (WinRT). For the ARM version I will have problems? – Matt Swarthout Feb 25 '13 at 19:29
Yes, quite possibly. Basically, on x86 (and several other processors), whenever memory is written [even if it's in the cache], the processor will issue a "snoop cycle" to say "If someone else has the memory for address X, then please get rid of it, as I have just updated that memory". But some types of processors don't do this - they just say "Oh, it's in my cache, it's ok to use". So whenever another processor does an update, it will have to also flush the caches [for that address] on the other processors. More here: http://en.wikipedia.org/wiki/Cache_coherence – Mats Petersson Feb 25 '13 at 19:33
1

@Matt Swarthout : Mats has a point. In SMP ARMs: loads and stores can be observed in any order, and that will cause problems witch does not occur in i386 machines. Is is complex topic. [Here is very good story about that](http://developer.android.com/training/articles/smp.html). – SKi Feb 25 '13 at 20:24
Actually, I'm not talking about out of order stores/loads. I'm talking about processor 1 writing the value `false` to `programRunning` and processor 2 not seeing that write because it already has a cached copy of `programRunning`. In which case `stillRendering` never gets set to `false`, and the program never quits. Ok, it is likely not going to go on forever, because sooner or later either a processor switch or cache eviction of `programRunning` will happen, and the `programRunning` will then be reloaded from real memory and get its new value. But it may take quite some time. – Mats Petersson Feb 25 '13 at 22:28

Drew Dormann · Answer 5 · 2013-02-25T19:53:29.353

Adding volatile to just stillRendering caused the application to successfully exit everytime I've tested it

Yes, your scenario will work.

The common mistake that occurs when using volatile variables for thread synchronization is when operations on volatile variables are presumed to be atomic. They aren't.

In your case, you are polling a single bool waiting for it to change exactly once to exactly 0. You don't seem to be expecting any operation to be atomic. On the other hand, even if you were polling a single int, C++ will not guarantee that a thread changing that int will do so atomically.

I'm not certain why it doesn't seem to matter if "programRunning" is volatile.

It does matter. Make it volatile.

Making a variable volatile will guarantee that specific cache optimizations are avoided, which is what you want.

That doesn't mean that you are guaranteed those same cache optimizations when a variable isn't volatile. You're simply letting the compiler decide. And at this particular time, the compiler happens to be making a decision that works for you.

Lastly, I am uncertain how the performance of the program will be impacted by using volatile for "stillRendering".

Your performance will likely be negatively affected by this:

while(stillRendering)
{
}

You are asking one thread (perhaps one entire CPU core) to endlessly, without rest, read a single variable.

Consider adding a sleep call in that while loop.

> Consider adding a sleep call in that while loop. - or just join that thread and do not invent bycicles — Slava, Feb 25 '13 at 19:19
@Slava I will add the sleep. I can't join because the threads don't know about each other. The render thread can be created from anywhere. The application just needs to know that nothing is reading from it before it shuts itself down. — Matt Swarthout, Feb 25 '13 at 19:27
The answer is wrong. `volatile` is not sufficient here, since it only affects the elements it is attached to. — James Kanze, Feb 25 '13 at 19:32
@JamesKanze are you saying that other variables are thread-unsafe in that program? Can you explain? — Drew Dormann, Feb 25 '13 at 19:56
`volatile` only affects the variable it is attached to. It doesn't affect anything concerning other variables (at least those which are not also `volatile`). — James Kanze, Feb 26 '13 at 09:07

Safe to use volatile bool to force another thread to wait? (C++)

5 Answers5

Linked