33

I'm running a thread that runs until a flag is set.

std::atomic<bool> stop(false);

void f() {
  while(!stop.load(std::memory_order_{relaxed,acquire})) {
    do_the_job();
  }
}

I wonder if the compiler can unroll loop like this (I don't want it to happen).

void f() {
  while(!stop.load(std::memory_order_{relaxed,acquire})) {
    do_the_job();
    do_the_job();
    do_the_job();
    do_the_job();
    ... // unroll as many as the compiler wants
  }
}

It is said that volatility and atomicity are orthogonal, but I'm a bit confused. Is the compiler free to cache the value of the atomic variable and unroll the loop? If the compiler can unroll the loop, then I think I have to put volatile to the flag, and I want to be sure.

Should I put volatile?


I'm sorry for being ambiguous. I (guess that I) understand what reordering is and what memory_order_*s mean, and I'm sure I fully understand what volatile is.

I think the while() loop can be transformed as an infinite if statements like this.

void f() {
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  do_the_job();
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  do_the_job();
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  do_the_job();
  ...
}

Since the given memory orders don't prevent the sequenced-before operations from being moved past the atomic load, I think it can be rearranged if it's without volatile.

void f() {
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  if(stop.load(std::memory_order_{relaxed,acquire})) return;
  ...
  do_the_job();
  do_the_job();
  do_the_job();
  ...
}

If the atomic does not imply volatile, then I think the code can be even transformed like this at worst case.

void f() {
  if(stop.load(std::memory_order_{relaxed,acquire})) return;

  while(true) {
    do_the_job();
  }
}

There will never be such an insane implementation, but I guess it's still a possible situation. I think the only way to prevent this is to put volatile to the atomic variable and am asking about it.

There are a lot of guesses that I made, please tell me if there's anything wrong among them.

Inbae Jeong
  • 4,053
  • 25
  • 38
  • I don't think so. Watched a lot for `std::atomic` lately, but no one said it should be. I guess, inside the class there is `volatile`variable somewhere. – Nick Apr 08 '16 at 09:59
  • 2
    Possible duplicate of [Concurrency: Atomic and volatile in C++11 memory model](http://stackoverflow.com/questions/8819095/concurrency-atomic-and-volatile-in-c11-memory-model) – Johann Gerell Apr 08 '16 at 10:00
  • 1
    No, it should not be volatile. – Sven Nilsson Apr 08 '16 at 10:06
  • 1
    Are you asking about what is guaranteed or what happens to happen on some particular platform? If the former, why would you bring up `volatile`, since it has no guaranteed multithreaded semantics? And if the latter, why don't you mention your platform? – David Schwartz Apr 08 '16 at 10:14
  • It might be a duplicate but I still don't understand it yet, so I'll mark it as a duplicate when i fully understand it. – Inbae Jeong Apr 08 '16 at 10:21
  • 4
    @Nick, `std::atomic` does not need `volatile` inside it somewhere, because `volatile` is neither necessary nor sufficient for correct synchronization between threads. Using `volatile` would not help at all. `std::atomic` uses atomic operations, not `volatile`, because it needs to be atomic, not volatile. They are orthogonal concepts. http://isvolatileusefulwiththreads.com – Jonathan Wakely Apr 08 '16 at 10:24
  • You can see here : https://godbolt.org/g/Cv4OMJ, that gcc will unroll your loop - but maybe only because I have added -funroll-all-loops. Anyway it should still generate standard compliant code - I suppose. – marcinj Apr 08 '16 at 11:03
  • Related: [Why don't compilers merge redundant std::atomic writes?](//stackoverflow.com/a/45971285): `volatile atomic` isn't currently needed, because compilers currently refrain from doing optimizations that it would stop. This may change in the future, but `volatile` isn't sufficient control for all cases so it's probably not a good idea to uglify your code with `volatile atomic` just yet. For many cases, using an appropriate memory_order is all you need. – Peter Cordes Apr 24 '18 at 03:46
  • And BTW, the things that `volatile` gives you does partially overlap with what `atomic` gives you: the compiler has to assume async modification by other threads. – Peter Cordes Apr 24 '18 at 03:48
  • @JonathanWakely On x86, a volatile write is not sufficient to release a spinlock? – curiousguy Jan 01 '20 at 06:08
  • Possible? In general, for any code, for any optimization level? Under which assumptions? – curiousguy Jan 01 '20 at 06:35

2 Answers2

13

Is the compiler free to cache the value of the atomic variable and unroll the loop?

The compiler cannot cache the value of an atomic variable.

However, since you are using std::memory_order_relaxed, that means the compiler is free to reorder loads and stores from/to this atomic variable with regards to other loads and stores.

Also note, that a call to a function whose definition is not available in this translation unit is a compiler memory barrier. That means the the call cannot not be reordered with regards to surrounding loads and stores and that all non-local variables must be reloaded from memory after the call, as if they were all marked volatile. (Local variables whose address was not passed elsewhere will not be reloaded though).

The transformation of code you would like to avoid would not be a valid transformation because that would violate C++ memory model: in the first case you have one load of an atomic variable followed by a call to do_the_job, in the second, you have multiple calls. The observed behaviour of the transformed code may be different.


And a note from std::memory_order:

Relationship with volatile

Within a thread of execution, accesses (reads and writes) to all volatile objects are guaranteed to not be reordered relative to each other, but this order is not guaranteed to be observed by another thread, since volatile access does not establish inter-thread synchronization.

In addition, volatile accesses are not atomic (concurrent read and write is a data race) and do not order memory (non-volatile memory accesses may be freely reordered around the volatile access).

This bit non-volatile memory accesses may be freely reordered around the volatile access is true for relaxed atomics as well, since relaxed load and stores can be reordered with regards to other loads and stores.

In other words, adorning your atomic with volatile would not change the behaviour of your code.


Regardless, C++11 atomic variables do not need to be marked with volatile keyword.


Here is an example how g++-5.2 honours atomic variables. The following functions:

__attribute__((noinline)) int f(std::atomic<int>& a) {
    return a.load(std::memory_order_relaxed);
}

__attribute__((noinline)) int g(std::atomic<int>& a) {
    static_cast<void>(a.load(std::memory_order_relaxed));
    static_cast<void>(a.load(std::memory_order_relaxed));
    static_cast<void>(a.load(std::memory_order_relaxed));
    return a.load(std::memory_order_relaxed);
}

__attribute__((noinline)) int h(std::atomic<int>& a) {
    while(a.load(std::memory_order_relaxed))
        ;
    return 0;
}

Compiled with g++ -o- -Wall -Wextra -S -march=native -O3 -pthread -std=gnu++11 test.cc | c++filt > test.S produce the following assembly:

f(std::atomic<int>&):
    movl    (%rdi), %eax
    ret

g(std::atomic<int>&):
    movl    (%rdi), %eax
    movl    (%rdi), %eax
    movl    (%rdi), %eax
    movl    (%rdi), %eax
    ret

h(std::atomic<int>&):
.L4:
    movl    (%rdi), %eax
    testl   %eax, %eax
    jne .L4
    ret
Community
  • 1
  • 1
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • I don't think we can assume that a function will be a compiler barrier since there's a beast called LTO. Do you mean that even two successive atomic load operation on the same variable can not be transformed to a single load? – Inbae Jeong Apr 08 '16 at 11:35
  • @kukyakya The memory model suggests that an atomic variable may be changed by another thread, hence the load cannot be elided. Eliding the load of an atomic variable would make stores to the atomic variable invisible to other threads, which would violate the memory model which guarantees visibility of stores to atomic variables. – Maxim Egorushkin Apr 08 '16 at 11:41
  • 2
    "which guarantees visibility of stores to atomic variables" There's no guarantee that a store becomes visible to other loads within any finite period of time; the best we have is "Implementations should make atomic stores visible to atomic loads within a reasonable amount of time", which is normative encouragement ("should"), not a requirement. – T.C. Apr 08 '16 at 16:14
  • @T.C. Is your point that a compiler can eliminate loads from atomic variables? – Maxim Egorushkin Apr 08 '16 at 16:17
  • 1
    @MaximEgorushkin: The compiler can eliminate consecutive loads to the same atomic variable (if there are no other memory barriers in between), as you can't tell the difference between an elided redundant load and two loads, that happen so fast after another, that no other thread had the time to change the variable in between. – MikeMB Apr 11 '16 at 10:31
  • @MikeMB At least gcc-5.2.0 disagrees with you - it does not eliminate redundant consecutive relaxed loads of the same atomic variable with no intervening memory barriers with `-O3 -march=native`. – Maxim Egorushkin Apr 11 '16 at 10:48
  • 1
    @MaximEgorushkin: I meant "is allowed to", not "is able to" or "will". Actually, I don't think any compiler will, because it is almost never what the programer actually wants. Also, I think that most implementations of atomic operations involve loads or writes through volatile pointers somewhere. – MikeMB Apr 11 '16 at 15:01
  • @MikeMB What is the basis for "is allowed to"? – Maxim Egorushkin Apr 11 '16 at 15:06
  • 1
    @MaximEgorushkin: As I said: because you can't tell the difference (or at least I don't know how). The "as if"-rule is the basis of most optimizations. – MikeMB Apr 11 '16 at 15:12
  • But it seems that my asumption about the volatile pointer is wrong. I ended up at the wrong overload. – MikeMB Apr 11 '16 at 15:17
  • 2
    @MikeMB is right: According to ISO C++11, compiler *are allowed to* eliminate consecutive loads from the same non-`volatile` atomic variable. Maxim: Current compilers don't do that, or merge consecutive writes, as a quality-of-implementation issue, not because the standard forbids it. (e.g. progress-bar update stores could sink out of a loop...) [Why don't compilers merge redundant std::atomic writes?](//stackoverflow.com/a/45971285). The C++ committee is working on new features so programmers can control when the compiler is/isn't allowed to optimize atomics. – Peter Cordes Apr 24 '18 at 03:43
  • 1
    @Maxim: Compilers have many missed optimizations; lack of one doesn't mean something's forbidden! Only the other direction gives you any definite info: if a compiler *does* do an optimization, then it's probably allowed (or a compiler bug). – Peter Cordes Apr 24 '18 at 03:44
3

If do_the_job() does not change stop, it doesn't matter if the compiler can unroll the loop, or not.

std::memory_order_relaxed just makes sure each operation is atomic, but it does not prevent reordering accesses. That means if another thread sets stop to true, the loop may continue to execute a few times, because the accesses may be reordered. So it is the same situation as with an unrolled loop: do_the_job() may be executed a few times after another thread has set stop to true.

So no, don't use volatile, use std::memory_order_acquire and std::memory_order_release.

alain
  • 11,939
  • 2
  • 31
  • 51
  • I got your point. Since it's not guaranteed that load operation gets the last value, it's meaningless to limit the number of calls to the function. What about the case I added? – Inbae Jeong Apr 08 '16 at 11:38
  • I find it hard to reason about this without knowing what `do_the_job()` does, and what the thread that sets `stop` does. There is surely more synchronisation needed between the two if they access common data, I think. Can you post a more detailed example? – alain Apr 08 '16 at 11:56
  • How does `std::memory_order_acquire` prevent the unrolling? – curiousguy Jan 01 '20 at 06:09