Synchronize Threads - InterlockedExchange

Question

I like to check if a thread is doing work. If the thread is doing work I will wait for an event until the thread has stopped its work. The event the thread will set at the end.

To check if the thread is working I declared a volatile bool variable. The bool variable will be true if the thread is running, else it is false. At the end of the thread the bool variable will be set to false.

Is it adequate to use a volatile bool variable or do I have to use an atomic function?

BTW: Can please someone explain me the InterlockedExchange Method, I don´t understand the use case I will need this function.

Update

I see without my code it is not clear to say if a volatile bool variable will adequate. I wrote a testclass which shows my problem.

class Testclass
{
public:
    Testclass(void);
    ~Testclass(void);

    void doThreadedWork();
    void Work();

    void StartWork();

    void WaitUntilFinish();
private:
    HANDLE hHasWork;
    HANDLE hAbort;
    HANDLE hFinished;

    volatile bool m_bWorking;
};

//.cpp

#include "stdafx.h"
#include "Testclass.h"

CRITICAL_SECTION cs;

DWORD WINAPI myThread(LPVOID lpParameter)
{
    Testclass* pTestclass = (Testclass*) lpParameter;
    pTestclass->doThreadedWork();
    return 0;
}

Testclass::Testclass(void)
{
    InitializeCriticalSection(&cs);
    DWORD myThreadID;
    HANDLE myHandle = CreateThread(0, 0, myThread, this, 0, &myThreadID);
    m_bWorking = false;

    hHasWork = CreateEvent(NULL,TRUE,FALSE,NULL);
    hAbort = CreateEvent(NULL,TRUE,FALSE,NULL);

    hFinished = CreateEvent(NULL,FALSE,FALSE,NULL);
}


Testclass::~Testclass(void)
{
    DeleteCriticalSection(&cs);

    CloseHandle(hHasWork);
    CloseHandle(hAbort);
    CloseHandle(hFinished);
}

void Testclass::Work()
{
    // do some work

    m_bWorking = false;
    SetEvent(hFinished);
}

void Testclass::StartWork()
{
    EnterCriticalSection(&cs);
    m_bWorking = true;
    ResetEvent(hFinished);
    SetEvent(hHasWork);
    LeaveCriticalSection(&cs);
}

void Testclass::doThreadedWork()
{
    HANDLE hEvents[2];
    hEvents[0] = hHasWork;
    hEvents[1] = hAbort;

    while(true)
    {
        DWORD dwEvent = WaitForMultipleObjects(2, hEvents, FALSE, INFINITE); 
        if(WAIT_OBJECT_0 == dwEvent)
        {
            Work();
        }
        else
        {
            break;
        }
    }
}


void Testclass::WaitUntilFinish()
{
    EnterCriticalSection(&cs);
    if(!m_bWorking)
    {
        // if the thread is not working, do not wait and return
        LeaveCriticalSection(&cs);
        return;
    }

    WaitForSingleObject(hFinished,INFINITE);

    LeaveCriticalSection(&cs);
}

For me it is not realy clear if m_bWorking value n a atomic way or if the volatile cast will adequate.

Off topic: Look into [message queues](https://en.wikipedia.org/wiki/Message_queue) and [thread pools](https://en.wikipedia.org/wiki/Thread_pool). They may conceptually simplify what you are trying to do — user4581301, Jan 22 '16 at 21:44
volatile bool is perfectly ok for this case. It's very easy to try and see. — BitWhistler, Jan 22 '16 at 22:42
@BitWhistler Could you explain a bit more why you think it's _perfectly OK_? I don't think is is. — πάντα ῥεῖ, Jan 22 '16 at 23:05
The meaning of "volatile" is it's in memory. Every access to this variable is from/to memory, so when you write it from one thread, other threads see it immediately. They only have to check. It is very easy to check in a small test progy and I suggest you go through this experiment. In fact, you can even have queues based on this fact without using atomics. — BitWhistler, Jan 22 '16 at 23:19
Suppose that you check if the thread is working and the answer is `b`. Whilst you decide what to do and in that same instant the state changes to `!b`. What is your actual problem? — David Heffernan, Jan 23 '16 at 08:32
In general, do not use `volatile` for multithreading, it's both superfluous and not enough. Under win32, you need `volatile` for interaction with the `Interlocked...` family of functions though, but even there, it's better to use standard C++ atomics. — Ulrich Eckhardt, Jan 23 '16 at 09:46
Regarding your edit, your bool is pointless. If you want to wait, just wait. You are introducing complexity where none is needed. — David Heffernan, Jan 23 '16 at 12:55
@DavidHeffernan The idea is not to wait, if the thread is not working. So why it is pointless? — lalelu, Jan 23 '16 at 15:18
@HarryJ & co, I suggest you read Drepper's "what every programmer should know about memory". This was written way before atomics were introduced to the cpp standard. What you're saying about volatile not enough is simply incorrect. Wrappers like Interlocked* in MSVC and __sync_* in gcc cast (or force you to) to volatile as underlying instructions take one argument in memory. — BitWhistler, Jan 23 '16 at 16:01
@HarryJ, there's no such thing writing only to cache and not to memory. Writes go either to both mem and cache (write-back) or only to memory (write-through) — BitWhistler, Jan 23 '16 at 16:04
If the thread has never started, i would be waiting infinite. — lalelu, Jan 23 '16 at 16:16
Don't you know whether or not you started the thread. Really, waiting on a thread is just not this hard. You've got way too much complexity. The end result will be that your code will be incorrect. — David Heffernan, Jan 23 '16 at 18:22
No i don´t know whether or not the thread is started. This thread does some work. When the work will done is random. Only one work order can be done at the same time. But there can be more work orders in a queue. — lalelu, Jan 23 '16 at 20:23
Anyway, a far bigger problem are the massive races you have. You create the thread first, and then the events. The code is pretty much beyond saving. You need to start again and write the simplest possible form you can, working hard to avoid the races that afflicted you here. — David Heffernan, Jan 23 '16 at 22:36
@HarryJ we're not talking about barriers nor about reordering here. Its just a bool FFS! volatile will force gcc to write it to memory even if it doesn't see it used and so would optimize it away otherwise. — BitWhistler, Jan 24 '16 at 08:58
The first answer below is a correct answer. You may go with a volatile bool, or wrap the cast to volatile in a function like in the answer below. That's exactly what std::atomic_store/load do. — BitWhistler, Jan 24 '16 at 16:50
@BitWhistler: you might want to add a link to the answer you're talking about. The order in which you see the answers isn't necessarily the same as the order in which anyone else sees them. (I'm still of the opinion that if [even the experts don't agree](http://stackoverflow.com/a/3114174/886887) about what the standard requires `volatile` to achieve, it isn't safe to depend on it. But now that we can see what the OP is actually doing, the question is largely moot anyway.) — Harry Johnston, Jan 24 '16 at 21:07
Thanks @HarryJ. We're in agreement on the OP:) //// "experts don't agree about what the _*standard*_ require" - is exactly the point. I'm not talking about the standard coz it's lacking, and the debate between the kernel guys and the standard guys is still on. //// In practice, `atomic_store(int* p, int v)` is just this `{ *(volatile int*)p = v; }` so it obviously works, but a volatile variable is also enough, for visibility only. — BitWhistler, Jan 25 '16 at 09:53
@BitWhistler: yeah, that can be a pain. :-) *In practice* - that depends on the compiler and platform though. My main concern was just that your comments might give the OP a false sense of security about `volatile` in general as opposed to this specific situation. Ten years from now the OP may be programming on a compiler for a hardware platform neither of which exist today, and `volatile` may not always be enough. :-) — Harry Johnston, Jan 26 '16 at 01:39
@HarryJhnston: Maybe I should have said it's a bit more than enough;) — BitWhistler, Jan 26 '16 at 11:56

AlienRancher · Answer 1 · 2016-01-23T19:13:16.690

There is a lot of background to cover for your question. We don't know for example what tool chain you are using so I am going to answer it as a winapi question. I further assume you have some something in mind like this:

volatile bool flag = false;

DWORD WINAPI WorkFn(void*) {
   flag = true;
   //  work here
   ....
   // done.
   flag = false;
   return 0;
}

int main() {
  HANDLE th = CreateThread(...., &WorkFn, NULL, ..);

  // wait for start of work.
  while (!flag) {
     // ?? # 1
  }
  // Seems thread is busy now. Time to wait for it to finish.
  while (flag) {
     // ?? # 2
  }

}

There are many things wrong here. For starters the volatile does very little here. When flag = true happens it will eventually be visible to the other thread because it is backed by a global variable. This is so because it will at least make it into the cache and the cache has ways to tell other processors that a given line (which is a range of addresses) is dirty. The only way it would not make it into the cache is that if the compiler makes a super crazy optimization in which flag stays in the cpu as a register. That could actually happen but not in this particular code example.

So volatile tells the compiler to never keep the variable as a register. That is what it is, every time you see a volatile variable you can translate it as "never enregister this variable". Its use here is just basically a paranoid move.

If this code is what you had in mind then this looping over a flag pattern is called a Spinlock and this one is a really poor one. It is almost never the right thing to do in a user mode program.

Before we go into better approaches let me tackle your Interlocked question. What people usually mean is this pattern

volatile long flag = 0;

DWORD WINAPI WorkFn(void*) {
  InterlockedExchange(&flag, 1);
  ....
}

int main() {
 ...
  while (InterlockedCompareExchange(&flag, 1, 1) = 0L) {
    YieldProcessor();
  }
  ...
}

Assume the ... means similar code as before. What the InterlockedExchange() is doing is forcing the write to memory to happen in a deterministic, "broadcast the change now", kind of way and the typical way to read it in the same "bypass the cache" way is via InterlockedCompareExchange().

One problem with them is that they generate more traffic on the system bus. That is, the bus now being used to broadcast cache synchronization packets among the cpus on the system.

std::atomic<bool> flag would be the modern, C++11 way to do the same, but still not what you really want to do.

I added the YieldProcessor() call there to point to the real problem. When you wait for a memory address to change you are using cpu resources that would be better used somewhere else, for example in the actual work (!!). If you actually yield the processor there is at least a chance that the OS will give it to the WorkFn, but in a multicore machine it will quickly go back to polling the variable. In a modern machine you will be checking this flag millions of times per second, with the yield, probably 200000 times per second. Terrible waste either way.

What you want to do here is to leverage Windows to do a zero-cost wait, or at least a low cost as you want to:

DWORD WINAPI WorkFn(void*) {
   //  work here
   ....
   return 0;
}

int main() {
  HANDLE th = CreateThread(...., &WorkFn, NULL, ..);

  WaitForSingleObject(th, INFINITE);
  // work is done!
  CloseHandle(th);

}

When you return from the worker thread the thread handle get signaled and the wait it satisfied. While stuck in WaitForSingleObject you don't consume any cpu cycles. If you want to do a periodic activity in the main() function while you wait you can replace INFINITE with 1000, which will release the main thread every second. In that case you need to check the return value of WaitForSingleObject to tell the timeout from thread being done case.

If you need to actually know when work started, you need an additional waitable object, for example, a Windows event which is obtained via CreateEvent() and can be waited on using the same WaitForSingleObject.

Update [1/23/2016]

Now that we can see the code you have in mind, you don't need atomics, volatile works just fine. The m_bWorking is protected by the cs mutex anyhow for the true case.

If I might suggest, you can use TryEnterCriticalSection and cs to accomplish the same without m_bWorking at all:

void Testclass::Work()
{
    EnterCriticalSection(&cs);
    // do some work

    LeaveCriticalSection(&cs);
    SetEvent(hFinished);     // could be removed as well
}

void Testclass::StartWork()
{
    ResetEvent(hFinished);      // could be removed.
    SetEvent(hHasWork);
}

void Testclass::WaitUntilFinish()
{
    if (TryEnterCriticalSection(&cs)) {
        // Not busy now.
        LeaveCriticalSection(&cs);
        return;
    } else {
        // busy doing work. If we use EnterCriticalSection(&cs)
        // here we can even eliminate hFinished from the code.
    }
  ...
}

Thank you for your answer but this is unfortunately not orrect. -->The time window between StartWork and the thread does realy start is not synchronized. If WaitUntilFinish will called after StartWork but before the thread entered the critical section, than TryEnterCriticalSection will retur true, but there is already work to do. Enter the CriticalSection in StartWork and leaving it in Work will also not work because i can´t Enter and Leave a criticalsection in different threads. — lalelu, Jan 24 '16 at 18:28
I am sorry but between StartWork and the actual work the thread is not "doing work" imho. If you want what you mention now just track the state of hHasWork, use WaitforSingleObject(hHasWork, 0) to get its state. Even simpler. — AlienRancher, Jan 25 '16 at 04:17

score 1 · Answer 2 · answered Jan 22 '16 at 21:58

For some reason, the Interlocked API does not include an "InterlockedGet" or "InterlockedSet" function. This is a strange omission and the typical work around is to cast through volatile.

You can use code like the following on Windows:

#include <intrin.h>

__inline int InterlockedIncrement(int *j)
{ // This is VS-specific
    return _InterlockedIncrement((volatile LONG *) j);
}

__inline int InterlockedDecrement(int *j)
{ // This is VS-specific
    return _InterlockedDecrement((volatile LONG *) j);
}

__inline static void InterlockedSet(int *val, int newval)
{
    *((volatile int *)val) = newval;
}

__inline static int InterlockedGet(int *val)
{
    return *((volatile int *)val);
}

Yes, it's ugly. But it's the best way to work around the deficiency if you're not using C++11. If you're using C++11, use std::atomic instead.

Note that this is Windows-specific code and should not be used on other platforms.

score 0 · Answer 3 · answered Jan 22 '16 at 21:31

No, volatile bool will not be enough. You need an atomic bool, as you correctly suspect. Otherwise, you might never see your bool updated.

There is also no InterlockedExchange in C++ (the tags of your question), but there are compare_exchange_weak and compare_exchange_strong functions in C++11. Those are used to set the value of an object to a certain NewValue, provided it's current value is TestValue and indicate the status of this attempt (was the change made or not). The benefit of those functions is that this is done in such a fasion that you are guaranteed that if two threads are trying to perform this operation, only one will succeed. This is very helpful when you need to take a certain actions depending on the result of the operation.

Synchronize Threads - InterlockedExchange

3 Answers3