Once more volatile: necessary to prevent optimization?

Question

I've been reading a lot about the 'volatile' keyword but I still don't have a definitive answer.

Consider this code:

class A
{
public:
    void work()
    {
        working = true;

        while(working)
        {
            processSomeJob();
        }
    }

    void stopWorking() // Can be called from another thread
    {
        working = false;
    }
private:
    bool working;
}

As work() enters its loop the value of 'working' is true.

Now I'm guessing the compiler is allowed to optimize the while(working) to while(true) as the value of 'working' is true when starting the loop.
- If this is not the case, that would mean something like this would be quite inefficient:
```
for(int i = 0; i < someOtherClassMember; i++)
{
    doSomething(); 
}
```
...as the value of someOtherClassMember would have to be loaded each iteration.
- If this is the case, I would think 'working' has to be volatile in order to prevent the compiler from optimising it.

Which of these two is the case? When googling the use of volatile I find people claiming it's only useful when working with I/O devices writing to memory directly, but I also find claims that it should be used in a scenario like mine.

What is `processSomeJob()`? Where is it declared? What's in it? Etc. Please give us a complete picture. — curiousguy, Dec 05 '19 at 14:45

score 2 · Accepted Answer · answered Feb 18 '19 at 11:43

Your program will get optimized into an infinite loop^†.

void foo() { A{}.work(); }

gets compiled to (g++ with O2)

foo():
        sub     rsp, 8
.L2:
        call    processSomeJob()
        jmp     .L2

The standard defines what a hypothetical abstract machine would do with a program. Standard-compliant compilers have to compile your program to behave the same way as that machine in all observable behaviour. This is known as the as-if rule, the compiler has freedom as long as what your program does is the same, regardless of how.

Normally, reading and writing to a variable doesn't constitute as observable, which is why a compiler can elide as much reads and writes as it likes. The compiler can see working doesn't get assigned to and optimizes the read away. The (often misunderstood) effect of volatile is exactly to make them observable, which forces the compilers to leave the reads and writes alone^‡.

But wait you say, another thread may assign to working. This is where the leeway of undefined behaviour comes in. The compiler may do anything when there is undefined behaviour, including formatting your hard drive and still be standard-compliant. Since there are no synchronization and working isn't atomic, any other thread writing to working is a data race, which is unconditionally undefined behaviour. Therefore, the only time an infinite loop is wrong is when there is undefined behaviour, by which the compiler decided your program might as well keep on looping.

TL;DR Don't use plain bool and volatile for multi-threading. Use std::atomic<bool>.

_{†Not in all situations. void bar(A& a) { a.work(); } doesn't for some versions.}
_{‡Actually, there is some debate around this.}

The key here is the `A{}.` part. We don't have much context about where `A` is created. — curiousguy, Dec 05 '19 at 14:47
@curiousguy How is that key? The behaviour is the same regardless of how `A` is created. That some version of gcc doesn't optimize some cases doesn't say much about the validity of `work`. — Passer By, Dec 06 '19 at 11:11
The behavior is the same for a global object? For a dynamically created object? With which compiler? — curiousguy, Dec 06 '19 at 22:42
@curiousguy I meant for the abstract machine, the question isn't about performance. — Passer By, Dec 07 '19 at 06:57

eerorika · Answer 2 · 2019-02-18T11:59:15.403

Now I'm guessing the compiler is allowed to optimize the while(working) to while(true)

Potentially, yes. But only if it can prove that processSomeJob() does not modify the working variable i.e. if it can prove that the loop is infinite.

If this is not the case, that would mean something like this would be quite inefficient ... as the value of someOtherClassMember would have to be loaded each iteration

Your reasoning is sound. However, the memory location might remain in cache, and reading from CPU cache isn't necessarily significantly slow. If doSomething is complex enough to cause someOtherClassMember to be evicted from the cache, then sure we'd have to load from memory, but on the other hand doSomething might be so complex that a single memory load is insignificant in comparison.

Which of these two is the case?

Either. The optimiser will not be able to analyse all possible code paths; we cannot assume that the loop could be optimised in all cases. But if someOtherClassMember is provably not modified in any code paths, then proving it would be possible in theory, and therefore the loop can be optimised in theory.

but I also find claims that [volatile] should be used in a scenario like mine.

volatile doesn't help you here. If working is modified in another thread, then there is a data race. And data race means that the behaviour of the program is undefined.

To avoid a data race, you need synchronisation: Either use a mutex, or atomic operations to share access across threads.

score 1 · Answer 3 · answered Feb 18 '19 at 10:06

1

Volatile will make the while loop reload the working variable on every check. Practically that will often allow you to stop the working function with a call to stopWorking made from an asynchronous signal handler or another thread, but as per the standard it's not enough. The standard requires lock-free atomics or variables of type volatile sig_atomic_t for sighandler <-> regular context communication and atomics for inter-thread communication.

answered Feb 18 '19 at 10:06

Petr Skocik

58,047
6
95
142

But without volatile the compiler would be allowed to turn this into an infinite loop? Could it be possible that it *never* sees the value of 'working' change? – Mathijs Feb 18 '19 at 10:17
1

@Mathijs Yes, I think that the compiler is allowed to do that (although I'm not 100% sure, the standard is a bit complicated). And at least (the newest) gcc and clang *will* do that (regardless of what the standard says). – freakish Feb 18 '19 at 10:18
@Mathijs Yes. Without volatile the compiler can turn it into an infinite loop. With the volatile, if you compile the code in a separate translation unit, it should "work" with both signal handlers and separate threads due to how hardware works, but it still would be technically undefined so you shouldn't use it except for experimentation. – Petr Skocik Feb 18 '19 at 10:24
So in this case, what would be the favourable method? Using *volatile* so that it would at some point see that 'working' has changed, or making 'working' atomic? What would go wrong if I would just use volatile? – Mathijs Feb 18 '19 at 10:26
@Mathijs I don't know of any particular way compilers could mess it up if you just use volatile. For all I know, it should behave like an atomic with memory_order_relaxed. But atomics would be very much preferred. They allow more optimization and you would not be invoking undefined behavior. If you invoke undefined behavior, you're technically losing ALL guarantees about your program's behavior. – Petr Skocik Feb 18 '19 at 10:29
3

@Mathijs You should use atomics. See this: https://stackoverflow.com/questions/16320838/when-do-i-really-need-to-use-atomicbool-instead-of-bool Volatile alone may work on concrete platforms, maybe even all. But there's no guarantee AFAIK. Atomics give such guarantee so why bother? – freakish Feb 18 '19 at 10:46

Once more volatile: necessary to prevent optimization?

3 Answers3

Linked

Related