21

I wonder whether dereferencing a pointer will always be translated into a machine-level Load/Store instruction, regardless of how optimizing the compiler is.

Suppose we have two threads, one(let's call it Tom) receives user input and writes a bool variable. The variable is read by another(and this is Jerry) to decide whether to continue a loop. We know that an optimizing compiler may store the variable in a register when compiling the loop. So, at run time, Jerry may read an obsolete value that is different from what Tom actually writes. As a result,we should declare the bool variable as volatile.

However, if dreferencing a pointer will always cause memory access, then the two threads can use a pointer to reference the variable. On every write, Tom will store the new value into memory by dereferencing the pointer and writing to it. On each read, Jerry can really read what Tom wrote by dereferencing that same pointer. This seems better than the implementation-dependent volatile

I'm new to multi-threading programming, so this idea may seem trivial and unnecessary. But I'm really curious about it.

Aamir
  • 1,974
  • 1
  • 14
  • 18
YangZai
  • 313
  • 1
  • 7
  • 18
    "As a result,we should declare the bool variable as volatile." - That is flat out wrong. `volatile` does not make a variable atomic/thread safe. You need `std::atomic`, `std::mutex` or similar. – Jesper Juhl Jun 16 '23 at 10:21
  • 6
    Short answer: No, the compiler is free to optimize following the [as-if rule](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule) which typically is free to assume no multithreading. Your use of `volatile` to circumvent it is also obsolete. Since C++11 and C11 the languages come with memory models that include multithreading. Use the proper primitives such as `std::atomic` to deal with this – Homer512 Jun 16 '23 at 10:23
  • 4
    Dereferencing a pointer on two distinct threads, unless those operations are explicitly synchronised (e.g. via facilities like `std::atomic`, `std::mutex`) gives undefined behaviour. `volatile` does not introduce synchronisation. If behaviour is undefined, a compiler is correct - according to the standard - regardless of what it does (omits the operations, somehow does them with some form of synchronisation, crashes the host machine) – Peter Jun 16 '23 at 10:43
  • 1
    @JesperJuhl: The sentence you quote, “As a result,we should declare the `bool` variable as `volatile`,” asserts `volatile` is necessary. Your statement, ”`volatile` does not make a variable atomic/thread safe,” asserts `volatile` is not sufficient. Asserting `volatile` is not sufficient is not a rebuttal to an assertion that it is necessary. – Eric Postpischil Jun 16 '23 at 12:37
  • 3
    @Peter: That's not quite correct. Accessing the same object (whether or not through pointers) from multiple threads is ok if and only if all accesses are reads. There will be no data race unless there is some semantic modification taking place. The compiler cannot introduce data races, for example with spurious writes, into the C++ behavior. "As-if" allows it to do so at a lower level of abstraction, but only insofar as it maintains the guarantees of the C++ memory model. – Ben Voigt Jun 16 '23 at 21:00
  • 1
    @BenVoigt Yeah, true. I wrote my previous comment with the context of the question, which specifically described a variable being modified by one thread and accessed by another. – Peter Jun 17 '23 at 03:17

4 Answers4

30

Will dereferencing a pointer always cause memory access?

No, for example:

int five() {
    int x = 5;
    int *ptr = &x;
    return *ptr;
}

Any sane optimizing compiler will not emit a mov from stack memory here, but something along the lines of:

five():
  mov eax, 5
  ret

This is allowed because of the as-if rule.

How do I do inter-thread communication through a bool* then?

This is what std::atomic<bool> is for. You shouldn't communicate between threads using non-atomic objects, because accessing the same memory location through two threads in a conflicting way1) is undefined behavior in C++. std::atomic makes it thread-safe, volatile doesn't. For example:

void thread(std::atomic<bool> &stop_signal) {
    while (!stop_signal) {
        do_stuff();
    }
}

Technically, this doesn't imply that each load from stop_signal will actually happen. The compiler is allowed to do partial loop unrolling like:

void thread(std::atomic<bool> &stop_signal) {
    // only possible if the compiler knows that do_stuff() doesn't modify stop_signal
    while (!stop_signal) {
        do_stuff();
        do_stuff();
        do_stuff();
        do_stuff();
    }
}

An atomic load() is allowed to observe stale values, so the compiler can assume that four load()s would all read the same value. Only some operations, like fetch_add() are required to observe the most recent value. Even then, this optimization might be possible.

In practice, optimizations like these aren't implemented for std::atomic in any compiler, so std::atomic is quasi-volatile. The same applies to C's atomic_bool, and _Atomic types in general.


1) Two memory accesses at the same location conflict if at least one of them is writing, i.e. two reads at the same location don't conflict. See [intro.races]

See Also

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
  • 7
    It's probably useful to observe that you don't necessarily need a `std::atomic` for *every* value shared between threads, as long as there is *some* form of synchronization related to it. For example, if you use a `std::mutex` to protect the `bool` value (possibly along with some others), then the synchronization between unlocking the mutex on one thread and acquiring the mutex on the other thread will cause the compiler to ensure that everything necessary is done, e.g. to make sure that processor caches are coherent, to make sure reads/writes don't get reordered past the barriers, etc. – Daniel Schepler Jun 16 '23 at 18:33
  • 1
    "accessing the same memory through two threads at the same time is undefined behavior" - only if write access is involved, right? Reading is fine. – Thomas Weller Jun 17 '23 at 10:09
  • 1
    @ThomasWeller yes, I've clarified this in the answer now. – Jan Schultke Jun 17 '23 at 10:21
  • 1
    Worth mentioning that you can use `std::memory_order_relaxed` loads and stores for `stop_signal` ([Why set the stop flag using \`memory\_order\_seq\_cst\`, if you check it with \`memory\_order\_relaxed\`?](https://stackoverflow.com/q/70581645)). Often when people think they want to avoid `atomic`, it's because they think `atomic` is slower. But with `relaxed`, it's not, it is just a pure load in the asm even on weakly ordered ISAs like ARM, no extra barrier instructions. (On x86, acquire/release are "free", as are seq_cst loads; no extra barrier instructions needed.) – Peter Cordes Jun 18 '23 at 06:14
  • 1
    Also, since the question asked about `volatile` - [When to use volatile with multi threading?](https://stackoverflow.com/a/58535118) - never, but it does work in practice on current compilers, and was used historically (along with inline asm for memory ordering) before it was obsoleted by `std::atomic<>`. My answer there explains how/why it did and does still work in practice on real implementations, for people curious about what "magic" std::atomic needs to use. (Not much, CPU hardware does the work of maintaining cache coherency) – Peter Cordes Jun 18 '23 at 06:32
  • 1
    Also relevant to OP's problem, [MCU programming - C++ O2 optimization breaks while loop](https://electronics.stackexchange.com/q/387181) - a pointer to a `bool` is nearly as susceptible to being hoisted out of a loop at an access to a non-atomic global variable. In that case you'll still get one actual load, but not inside the loop so it still breaks buggy code. – Peter Cordes Jun 18 '23 at 06:56
5

Some ways of using the lvalue produced by dereferencing a pointer will not result in an access. For example, given int arr[5][4]; int *p; the statement p = *arr; will not dereference any storage associated with arr, but merely cause the compiler to recognize that the lvalue on the right half of the assignment is an int[4] which will decay into an int*.

Outside of such circumstances, the Standard seeks to classify as Undefined Behavior all situations in which the Standard is intended to allow an implementation to process a dereferencing operation by performing an access or not, at its leisure, and its decision would observably affect program behavior.

This philosophy leads to some rather murky corner cases in situations where a program uses some storage to hold a structure of type T, and then a structure of type U, and then a structure of type T again, then makes a copy of the T without having written all of the fields, and finally uses fwrite to output the entire copy of the T. If the compiler knows that a certain field in the original T was written with a certain value, it might generate code which stores that same value into the copy, without regard for whether the underlying storage might have changed. If nothing in the universe will ever care about what the bytes associate with that field hold in the data that was processed via fwrite, this shouldn't pose a problem, and requiring that the programmer ensure that all of the storage associated with the T be written using that type before it's copied as a type T would make it necessary for both the programmer and the computer that's running the program to do extra useless work. The Standard has no way of describing program behavior which would allow an implementation to observably fail to dereference all the fields of the T when copying it, without characterizing the program as invoking Undefined Behavior.

supercat
  • 77,689
  • 9
  • 166
  • 211
4

Another case where *p would not result in memory access is with the sizeof operator: sizeof(*p) will simply determine the size of the static type p points to. Note that in C, with variable length arguments, sizeof(*p) actually can require a memory access, but VLAs are a compiler extension in C++.

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
4

When working with multi-threading, explicitness is good. So I'm going to break down each and every piece.

"Will dereferencing pointers always cause memory access."

No. Consider the expression statement (void)*p. The *p performs indirection. From [expr.unary.op]:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

So the result is an lvalue reference. That, on its own, is not sufficient to cause a "read" of the data pointed to by p. In the above example, I explicitly throw away the result, so there's no reason to read the memory.

Of course, one might argue that the memory of p is read. Just to be pedantic, I'd point out that's one interpretation of the word. However, an optimizing compiler can see that the lvalue pointed to by p is not needed here, so it doesn't actually need to read/write the pointer at all.

Now what about in a multithread environment? The key to this is the "happens-before" relationship in [intro.multithread]. It's incredibly dry formal language, but the basic idea is that event A happens before event B if A is sequenced before B (in a single thread), or if A inter-thread-happens-before B. The latter is the fancy language lawyer speak for a tool used to capture the behavior of both synchronization primitives like mutexes and atomics.

If A does not happens-before B and B does not happens-before A, then the two events are not ordered with respect to eachother. This is what happens on two threads when you don't have anything like mutexes to force an ordering. If one event writes to a memory location and the other reads or writes to that address, the result is a data race. And a data race is undefined behavior: you get what you get. The spec does not have anything to say about what happens when that occurs. It doesn't say anything about whether it triggers a memory access or not... it says absolutely nothing about it.

As an effect of the rules codified in [intro.multithread], the compiler is effectively allowed to optimize its code as if a thread was operating in complete isolation unless a threading primitive (such as a mutex or atomic) forces otherwise. This includes all the usual elisions, such as not reading from memory if you don't have to.

Cort Ammon
  • 10,221
  • 31
  • 45
  • In the C++ abstract machine, the dereference in `(void)*p` did still "happen", so I think there's still data-race UB if any other thread is simultaneously writing the same object. You're of course correct that the as-if rule allows making asm for a real machine that doesn't actually load, but I wonder if gcc or clang `-fsanitize=thread` would make code to try to check for this.... https://godbolt.org/z/hvG6G1crY - no, tsan doesn't check `(void)*p`. You have to use the value somehow, or make it a `volatile` deref. `(void)*volatile_ptr` does compile to a load. – Peter Cordes Jun 18 '23 at 06:26
  • So probably unused-value removal optimization happens before tsan instruments the loads/stores that remain. So it's an implementation detail, not evidence that there isn't technically still UB possible in `(void)*p;`. – Peter Cordes Jun 18 '23 at 06:27
  • @PeterCordes I've been pondering that. As best as I can read, `*p` does not "accesses or modifies the memory location" pointed to by p. One would need to do something like copy it to cause such an access. Of course, this does access `p` itself, and if the pointer is changing, that would definitely form a data race. I don't think merely creating an lvalue referring to an object accesses it. And \[expr] indicates that `(void)X` is a "discarded-value" expression, where the value is discarded and only the side effects are considered. – Cort Ammon Jun 18 '23 at 14:43