2

What is the worst thing that could happen when using a normal bool flag for controlling when one thread stops what it is doing? The peculiarity is that the exact time at which the thread stops is not very important at all, it just play backs some media, it might even be a half-second late in reacting for all I care. It has a simple while (!restart) loop:

while (!restart) //bool restart
{ 
    //do something
}

and the other thread changes some seetings and then sets restart to true:

someSetting = newSetting;
restart = 1;

Since the playback loop runs thousands of times per second, I'm worried that using atomic bool might increase latency. I understand that this is "undefined behavior", but how does that manifest itself? If the bool is 54r*wx]% at some point, so what? Can I get runtime errors? The bool changes to a comprehensible value EVENTUALLY, doesn't it? (The code works currently, btw.) In another post, someone suggested that the flag might never change at all, since the threads have separate caches - this sounds iffy to me, surely the compiler must make sure that shared variables are changed even if a data race exists? Or is it possible that the order of execution for the controlling thread might change and someSetting might be changed after restart? Again, that just sounds creepy, why would a compiler allow that to happen?

I have considered setting a counter inside the loop and checking an atomic bool flag only every thousand times through. But I don't want to do that unless I really must.

  • 2
    The worst thing that could happen is that the compiler converts this to an infinite loop, or eliminates the loop body entirely. – Oliver Charlesworth Jan 28 '14 at 16:24
  • Well the loop (and the flag) work currently, so unless the optimizations get changed in the future, that doesn't seem to be a problem. – user3101050 Jan 28 '14 at 16:36
  • "The bool changes to a comprehensible value EVENTUALLY, doesn't it?" On today's popular processors, yes. But there's no guarantee that that will continue indefinitely - hardware-level cache coherency doesn't scale well as the number of processors increases. – Mike Seymour Jan 28 '14 at 16:39
  • 1
    "surely the compiler must make sure that shared variables are changed even if a data race exists?" - no. If there's a data race, then behaviour is undefined. The only way the compiler knows that it's "shared" is if you tell it so, by making it atomic or otherwise synchronising access to it. – Mike Seymour Jan 28 '14 at 16:41
  • That's the kind of assumption that will bite you in the future ;) There is nothing to prevent the compiler from assuming that the state of `restart` can't ever change (because you haven't told it that it's a shared variable)... – Oliver Charlesworth Jan 28 '14 at 16:45
  • Shouldn't there be a less expensive way of doing that (telling the compiler that the variable is shared, without actually making it atomic)? Because from another post here I understand that atomic bool can have a real hit on performance, at least on some compilers. – user3101050 Jan 28 '14 at 17:07
  • Or a hypothetical: what if I change some array in one thread and then much later, not at all concurrently, I read that array from another thread. Does the compiler have no real obligation to share that updated array between threads also? – user3101050 Jan 28 '14 at 17:09
  • @user3101050: Using an atomic flag is already the least expensive way to achieve what you want. No cheating around it, no matter how clever you get. You are right in observing that using an atomic is still quite expensive. But this is one of the fundamental facts of parallel computing: Synchronization *is* slow. I also wish this was different (certainly would make my life a lot easier), but it's a cruel world. – ComicSansMS Jan 28 '14 at 17:20
  • So long as restart is declared with the volatile modifier with an initial value that you want you will be ok. It counts as being "quick n dirty", but it's ok. – bazza Jan 29 '14 at 06:44
  • 2
    @bazza Only as long as you are compiling on Visual Studio. And there it is only ok, because [volatile introduces memory fences](http://stackoverflow.com/questions/7504646/visual-c-volatile) on Visual C++ as a non-standard extension. Which means they have _exactly the same_ performance impact as an atomic flag would (because they in fact do exactly the same). – ComicSansMS Jan 29 '14 at 07:55
  • @ComicSansMS No, memory fences aren't necessary in this instance. All that is needed is to stop the compiler optimising the storage of restart into a register, which is what the volatile keyword does. MS's extension is a neat additional side effect but plays no part here. This code is perfectly ok at the microelectronic level on all CPUs. – bazza Jan 30 '14 at 06:18
  • @bazza On a single-processor machine this might be true. On modern multi-core/multi-processor architectures you do need a memory fence, otherwise the CPU might keep reading an outdated value from cache without updating from memory. What's worse, memory writes might get reordered in a way that the condition protected by that flag is not valid when leaving the loop. `volatile` alone is completely useless when synchronizing multi-core machines. – ComicSansMS Jan 30 '14 at 08:22
  • @ComicSansMS, that's not how caches work. Two caches both caching the same address will sort themselves out if either is updated by their CPU. Regarding memory fences (thank you), one would help ensure someSetting was set up correctly before restart is set to 1. But on reflection the whole thing (including code inferred from the original question; I'm guessing there's another loop outside the one in the question) requires a single semaphore protecting both someSetting and restart (which would have to be reset by thread 1 at some point), and that will (normally) introduce memory fences anyway. – bazza Jan 31 '14 at 22:33
  • @bazza You are probably thinking about cache coherency protocols like MESI here. The problem is that those only kick in [after write buffers are flushed](http://en.wikipedia.org/wiki/MESI_protocol#Memory_Barriers). You still need memory barriers for correct synchronization. Frankly, in real-world desktop environments caches are likely to synchronize eventually due to other activities on the cores. But that might take a _very_ long time (in terms of CPU cycles). Also note that on x86 you get implicit fences with many instructions, so here you often get off lightly with such bugs. – ComicSansMS Feb 01 '14 at 08:17
  • @ComicSansMS, Hmmm interesting. It looks like both Intel and AMD have evolved MESI somewhat and don't use MESI. I've not found anything to suggest that MOESI and MESIF reduce the need for fences. I'm sticking with pthread semaphores, which I know includes the necessary fencing... – bazza Feb 01 '14 at 10:01
  • @bazza [Why is volatile not considered useful in multithreaded C or C++ programming?](http://stackoverflow.com/questions/2484980/) – fredoverflow Jun 05 '14 at 12:01
  • @FredOverflow, the issue with using memory fences alone is that it requires the compiler to know that a fence also implies that everything it has stored in registers should be written back to memory first. Whilst that might be so for one's usual compiler toolchain for best portability it's well worth using volatile where necessary. Plus I think it helps code readability because it shows which variables one's expecting to be accessed by multiple threads. In the systems I dev for we use volatile a lot. There's lots of data being DMAed all over the place, it's more like writing device drivers. – bazza Jun 06 '14 at 05:03
  • @bazza Have you actually read the linked question? If thread A writes to a volatile variable, it is perfectly possible that thread B will *never* see the new value. – fredoverflow Jun 06 '14 at 08:09
  • @FredOverflow, in some detail, especially comment 9 from Jalf. If the compiler has a memory barrier extension and you use it then you can assume that the compiler will also understand that variables currently in registers need to be written back to memory first (i.e. volatile). If the compiler doesn't have a memory barrier extension then you have to do it yourself and use volatile. Portable code needs volatile. – bazza Jun 07 '14 at 05:34
  • @FredOverflow, BTW libraries like pthread tend to have memory barriers inside them in routines like sem_post(); if you're using functions like that for syncronisation then the barriers happen anyway. However the compiler cannot see in the library, so it doesn't know that the fences are happening. In that case you still have to use volatile. – bazza Jun 07 '14 at 05:36
  • @FredOverflow Tool chains and compilers haven't necessarily helped. For example GCC 2.95 has an option -fvolatile, and some tool chains I've seen turned that on by default. It meant that you didn't necessarily need to put the word 'volatile' in your source because the compiler would assume it anyway. Result - working object code, programmer none the wiser. That's nothing to do with fences of course, but it shows that assumptions about what compilers and toolchains do and don't do have to be re-assessed. Or you can just write code assuming that the compiler isn't very clever. – bazza Jun 07 '14 at 06:21

1 Answers1

0

UB doesn't mean that your code doesn't work, It just mean that behaviour of your code doesn't specified by standard. You must use std::atomic to make your code standard compliant without actually changing the behaviour. You can do this using memory_order_relaxed:

atomic<int> restart ....

while (!restart.load(memory_order_relaxed))
{ 
    //do something
}

and in another thread:

someSetting = newSetting;
restart.store(1, memory_order_relaxed);

this code will emit the same instructions as yours.

Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83