How to avoid using volatile for lockfree program?

Question

My program is comprised of a bunch of threads happily chugging along, and the only synchronization I have is a volatile global bool that tells them if the user exited. All other communication between the threads is lockfree. These threads constantly have work to do, in a very time critical application, so I can't afford having locks between them. I recently came across a lot of information showing that volatile is bad for multi-threading, so I want to make my code better. I saw that std::atomic_flag is guaranteed lockfree, but I can't figure out how to use it for my case.

The basic setup is like this (omitting different files that the code is in):

// Global variable in its own .h file
extern volatile bool SystemOnline;

// Initialization in a .cpp file    
volatile bool SystemOnline = true;

// Windows message processing
while (SystemOnline)
{
    MSG msg;
    while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
    {
        if (msg.message == WM_QUIT)
        {
            SystemOnline = false;
        }
        else if (!TranslateAccelerator(msg.hwnd, NULL, &msg))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    }
}

// Thread functions
void Thread1()
{
    while (SystemOnline)
    {
        // Do some work
    }
}

void Thread2()
{
    while (SystemOnline)
    {
        // Do some other work
    }
}

// And so on...

All threads are joined at the end.

`volatile` has absolutely nothing to do with thread synchronization, and doesn't in any way help prevent data races. You are looking for [`std::atomic`](https://en.cppreference.com/w/cpp/atomic/atomic). In the code shown, you could simply make it `std::atomic SystemOnline = true;` — Igor Tandetnik, Oct 20 '19 at 03:51
@IgorTandetnik volatile is used to make the while statements exit when the variable becomes false. std::atomic is not guaranteed to be lockfree and can tank the performance of all threads. As I said, I'm trying to get rid of volatile, but the replacement needs to perform just as well — Eyal K., Oct 20 '19 at 03:53
No reasonable compiler will have a locking `std::atomic` or `std::atomic`. If you want to be extra sure, check with `is_lock_free()`. `volatile` does not prevent data races. Your program exhibits undefined behavior; "seems to work" is merely one possible manifestation of undefined behavior. — Igor Tandetnik, Oct 20 '19 at 03:56
@IgorTandetnik There's no data race here. The variable is only modified once in the entire lifetime of the program. Worst case is that the threads run for a few iterations more than needed. Can you elaborate on the undefined behavior? I will check if atomic bool is indeed lockfree — Eyal K., Oct 20 '19 at 04:02
The variable is modified in one thread, and read in another - possibly concurrently, as these two operations are not synchronized. This is the definition of a [data race](https://en.cppreference.com/w/cpp/language/memory_model#Threads_and_data_races). — Igor Tandetnik, Oct 20 '19 at 04:04
@EyalK. If you want to make sure that there are no locks when you use `std::atomic`, just add a `static_assert(std::atomic::is_always_lock_free)` in your code or before C++17, check that the macro `ATOMIC_BOOL_LOCK_FREE` expands to `2`. If `std::atomic` is not lock free, then there is probably a good hardware-level reason for that indicating that (fully) atomicity may not be possible without lock. — walnut, Oct 20 '19 at 04:14
@EyalK. - as others have pointed out `volatile` doesn't do what you think it does any more reliably. Back in the days of single-core machines and i386, you could reasonably use volatile to enable atomic assignments. But once multi-processing and multi-core became a thing, all bets are off. — selbie, Oct 20 '19 at 07:58

selbie · Accepted Answer · 2019-10-21T00:03:12.543

First, let's fix the bigger performance problem first. Your main Win32 thread is spinning without waiting. That's going to negate any perceived performance difference between a lockless bool and an std::atomic.

You'll burn an entire core just invoking PeekMessage redundantly on an empty queue. So instead of this:

while (SystemOnline)
{
    MSG msg;
    while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
    {
        if (msg.message == WM_QUIT)
        {
            SystemOnline = false;
        }
        else if (!TranslateAccelerator(msg.hwnd, NULL, &msg))
        {
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
    }
}

Let's do the following which is exactly equivalent to what you have above except that GetMessage will block until an incoming message arrives. And GetMessage returns FALSE when a WM_QUIT message is dequeued.

MSG msg;
while (GetMessage(&msg, NULL, 0, 0))
{
    if (!TranslateAccelerator(msg.hwnd, NULL, &msg))
    {
        TranslateMessage(&msg);
        DispatchMessage(&msg);
    }
}
SystemOnline = false;

And as other have pointed out, the conversion from a global bool to an atomic is pretty simple:

Just change this global:

// Global variable in its own .h file
extern volatile bool SystemOnline;

// Initialization in a .cpp file    
volatile bool SystemOnline = true;

To this:

// Global variable in its own .h file
#include <atomic>
extern std::atomic<bool> SystemOnline;

// Initialization in a .cpp file    
std::atomic<bool> SystemOnline(true);

You can also use std::atomic_bool - it's a typedef for std::atomic<bool>

And that's all you have to do. You don't even have to change your thread code since the () operator for this type just invokes the load() method on the object. Similarly, the = false assignment at the end of the main thread loop is the same as invoking SystemOnline.store(false).

As others have pointed out, there really isn't much of a performance penalty for using the atomic. On x86, it maps to the LOCK opcode prefix.

GetMessage() takes only 4 arguments, PM_REMOVE needs to not be there, but otherwise it works. — Eyal K., Oct 20 '19 at 12:26

How to avoid using volatile for lockfree program?

1 Answers1