50

I would like my thread to shut down more gracefully so I am trying to implement a simple signalling mechanism. I don't think I want a fully event-driven thread so I have a worker with a method to graceully stop it using a critical section Monitor (equivalent to a C# lock I believe):

DrawingThread.h

class DrawingThread {
    bool stopRequested;
    Runtime::Monitor CSMonitor;
    CPInfo *pPInfo;
    //More..
}

DrawingThread.cpp

void DrawingThread::Run() {
    if (!stopRequested)
        //Time consuming call#1
    if (!stopRequested) {
        CSMonitor.Enter();
        pPInfo = new CPInfo(/**/);
        //Not time consuming but pPInfo must either be null or constructed. 
        CSMonitor.Exit();
    }
    if (!stopRequested) {
        pPInfo->foobar(/**/);//Time consuming and can be signalled
    }
    if (!stopRequested) {
        //One more optional but time consuming call.
    }
}


void DrawingThread::RequestStop() {
    CSMonitor.Enter();
    stopRequested = true;
    if (pPInfo) pPInfo->RequestStop();
    CSMonitor.Exit();
}

I understand (at least in Windows) Monitor/locks are the least expensive thread synchronization primitive but I am keen to avoid overuse. Should I be wrapping each read of this boolean flag? It is initialized to false and only set once to true when stop is requested (if it is requested before the task completes).

My tutors advised to protect even bool's because read/writing may not be atomic. I think this one shot flag is the exception that proves the rule?

John
  • 6,433
  • 7
  • 47
  • 82
  • 1
    Take your tutor's advice (they are probably right) - even if they aren't they are the ones marking your work, so if they say it's needed then it's needed. – John3136 Feb 08 '12 at 20:36
  • 8
    @John3136 My tutor's advice is 10 years old, this is my work now. – John Feb 08 '12 at 20:37

4 Answers4

57

It is never OK to read something possibly modified in a different thread without synchronization. What level of synchronization is needed depends on what you are actually reading. For primitive types, you should have a look at atomic reads, e.g. in the form of std::atomic<bool>.

The reason synchronization is always needed is that the processors will have the data possibly shared in a cache line. It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization. Worse, yet, if there is no synchronization it may write the wrong value if something stored close to the value is changed and synchronized.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • I don't think my platform includes `std::atomic`. – John Feb 08 '12 at 20:52
  • 1
    `std::atomic` is part of C++ 2011. It is typically implemented on top of the platform's provided facilities which come with whatever is chosen on the respect platform. Effectively it consists of using an appropriate processor operation. How these are called and used depends on the platform. `std::atomic` gives a nice and portable interface to them. – Dietmar Kühl Feb 08 '12 at 20:56
  • It would be far more maintainable for me to just protect the access with `CSMonitor` as I was seeking to avoid. So I cannot safely neglect that here, even with `volatile bool stopRequested` because that value might be cached in the CPU and ignorant of updates even through the monitored accessor, `RequestStop`, unless it is being read within a critical section? – John Feb 08 '12 at 21:04
  • 1
    This is essentially correct. You need some sort of synchronization. You don't need to a heavy-weight synchronization like a "critical section" (note that a [critical](http://en.wikipedia.org/wiki/Critical_section) section" is actually not a programming construct but describing a property of a piece of code; I realize that Windows uses this inconsistently for what is otherwise called a "Mutex"). If you state the platform I might be able to figure out what you want to use (sitting right now in the C++ committee meeting discussion concurrency extensions there should be some expert...). – Dietmar Kühl Feb 08 '12 at 21:11
  • 1
    The platform is Samsung's bada. http://developer.bada.com/help/index.jsp?topic=/com.osp.cppappprogramming.help/html/dev_guide/base/monitor.htm advocates guarding critical sections with `Osp::Base::Runtime::Monitor`. – John Feb 08 '12 at 21:19
  • @Dietmar Kühl: Assuming the compiler cannot cause trouble in this case, is a fence also required on x86, or just only on CPUs with weak ordering like the Itanium? – Tudor Feb 08 '12 at 21:21
  • 2
    @Tudor: I spoke with a few people and according to the Intel guys you can get away with a `volatile` read (it needs to be `volatile` to prevent the compiler from doing something funny) and a normal write on Intel x86 systems. This doesn't necessarily extend to other x86 systems and it seems that it doesn't extend to multi-socket AMD x86 systems. The upshot is that you need to know quite precisely what x86 system you are running on to know whether it is working reliable. – Dietmar Kühl Feb 08 '12 at 23:14
  • @John The bada system seems to run on a single core ARM system. On a single core you don't have an issue with different cores seeing different statuses of caches: there is only one cache and this will be consistent. If there is a possibility that there are actually two cores this issue immediately pops up. I couldn't muster an ARM expert but my understanding is that the x86 is about the most forgiving system and this may sometimes require synchronization and even there are situations where the synchronization is required (although not for Intel x86 from the looks of it). – Dietmar Kühl Feb 08 '12 at 23:20
  • @DietmarKühl Could you explain how synchronization (i.e. a lock?) will force other processors to update its cache line data? I understand how the `volatile` keyword would force the variable to be read from main memory, but how would other synchronization primitives do that? Thank you. – cozos Apr 19 '18 at 02:38
  • 1
    @cozos: `volatile` communicates to the *compiler* that it cannot assume variable have a value it can predict and that any read/write of a variable actually has to be done. It doesn’t change the machine instructions emitted to read/write memory. Synchronization primitives (e.g., locks and atomic variables) conceptually issue machine instructions enforcing orders in which variables become visible to other processors. What these operations actually do on a given system depends on the processor’s memory consistency guarantees. Use of `volatile` does not [necessarily] issue such primitives. – Dietmar Kühl Apr 19 '18 at 06:21
  • 1
    "It has no reason to update this value to a value possibly changed in a different thread if there is no synchronization" This is wrong... right? If another thread writes some data, the cache coherency protocol guarantees that updated value will be seen by all other caches in the system. – Melandru's Square Nov 08 '19 at 21:23
  • 1
    @Melandru'sSquare Nope, that’s correct. There may be platforms like x86 based ones which have a fairly strong cache coherence protocol but that isn’t true in general. Note that strong cache coherency protocols come at a cost and I’d expect that future systems rather get weaker cache coherency protocols. – Dietmar Kühl Nov 08 '19 at 21:35
  • @DietmarKühl Just wanted to put here this video showing Hurb Sutter kind of saying that coherent cache are here to stay minute: 5:53 https://www.youtube.com/watch?v=A8eCGOqgvH4 – kroiz Feb 11 '20 at 13:44
17

Boolean assignment is atomic. That's not the problem.

The problem is that a thread may not not see changes to a variable done by a different thread due to either compiler or CPU instruction reordering or data caching (i.e. the thread that reads the boolean flag may read a cached value, instead of the actual updated value).

The solution is a memory fence, which indeed is implicitly added by lock statements, but for a single variable it's overkill. Just declare it as std::atomic<bool>.

Tudor
  • 61,523
  • 12
  • 102
  • 142
  • 2
    "Boolean assignment is atomic": Just out of interest, is this required by the C or C++ standard? – Niklas B. Feb 08 '12 at 20:42
  • 21
    Note that declaring something `volatile` in C++ has **no** effect on synchronization **at all**! The use of `volatile` is for the compiler to prevent it omitting and/or reordering instructions (this is different e.g. to the semantics of `volatile` in Java). You also need to tell the CPU that synchronization is needed. You need to use corresponding directive, e.g. `std::atomic`. – Dietmar Kühl Feb 08 '12 at 20:45
  • 3
    @NiklasB. I think what is meant is: you can't end up with an inconsistent state of a `bool` being seen by a thread. It is indeed atomic in this sense. However, this doesn't mean that the value is synchronized in any way with different cores at all. For these to be atomic you need to synchronization e.g. using `std::atomic`. – Dietmar Kühl Feb 08 '12 at 20:47
  • 1
    It's not the atomicity that is important to me here. Either the variable is in it's virgin state or it is detected not to be, possibly slightly after the update was initiated. The consequence of this delay might defeat the purpose of the thread's responsiveness. As Tudor points out I need to ensure both threads are reading from the same location. – John Feb 08 '12 at 20:49
  • 2
    @Niklas B.: By atomic I mean that the value cannot suffer "tearing", like for example a long on a 32-bit machine might get split in half since it needs two register copies. Of course, this does not imply visibility. – Tudor Feb 08 '12 at 20:50
  • I agree that `std::atomic` is probably the fool proof solution, that deals also with the case of CPUs having a weak ordering, not just the compiler causing trouble. Note that for Java/C#, `volatile` serves both purposes. – Tudor Feb 08 '12 at 20:54
  • @Tudor so is `volatile` the solution here or not? I've checked online, mostly related questions, and it seems it is, but Dietmar is insisting a more heavyweight construct is needed. – John Feb 08 '12 at 21:07
  • @John, `volatile` is the solution only for compiler problems in this case. The CPU can also cause problems. I would not go into a discussion about the weak ordering of various architectures because it seems to be a mess really. Some claim that on x86 you are safe with just `volatile`, some say you are not. You can use your initial idea of locks to be sure that a full memory fence is introduced. – Tudor Feb 08 '12 at 21:13
  • 1
    "The problem is that a thread may not not see changes to a variable done by a different thread due ... data caching", that's false. Cache coherency will take care of that, the only concern is the compiler reordering instructions or register-allocating the variable so it doesn't exist in cache. – Melandru's Square Nov 08 '19 at 21:26
7

The answer, I believe, is "it depends." If you're using C++03, threading isn't defined in the Standard, and you'll have to read what your compiler and your thread library say, although this kind of thing is usually called a "benign race" and is usually OK.

If you're using C++11, benign races are undefined behavior. Even when undefined behavior doesn't make sense for the underlying data type. The problem is that compilers can assume that programs have no undefined behavior, and make optimizations based on that (see also the Part 1 and Part 2 linked from there). For instance, your compiler could decide to read the flag once and cache the value because it's undefined behavior to write to the variable in another thread without some kind of mutex or memory barrier.

Of course, it may well be that your compiler promises to not make that optimization. You'll need to look.

The easiest solution is to use std::atomic<bool> in C++11, or something like Hans Boehm's atomic_ops elsewhere.

Max Lybbert
  • 19,717
  • 4
  • 46
  • 69
  • 2
    It doesn't "depend": if you need to share mutable data across multiple threads, even if it is just a `bool`, you need some sort of synchronization. Essentially this is to inform the CPU to arrange objects currently being cached to become visible. – Dietmar Kühl Feb 08 '12 at 20:59
  • 2
    It depends in the sense that (1) even when the Standard says something is undefined, implementations **are** allowed to define what will happen and benign races are common enough that there is a decent chance that the compiler may do so; and (2) while benign races are definitely undefined under C++11, it's harder to say that for C++03 (although the first paper I link to says exactly that, of course it also shows how widespread these kinds of data races are). – Max Lybbert Feb 08 '12 at 21:28
1

No, you have to protect every access, since modern compilers and cpus reorder the code without your multithreading tasks in mind. The read access from different threads might work, but don't have to work.

Jörg Beyer
  • 3,631
  • 21
  • 35