Do dependent reads require a load-acquire?

Question

Does the following program expose a data race, or any other concurrency concern?

#include <cstdio>
#include <cstdlib>

#include <atomic>
#include <thread>

class C {
public:
    int i;
};

std::atomic<C *> c_ptr{};

int main(int, char **) {
    std::thread t([] {
        auto read_ptr = c_ptr.load(std::memory_order_relaxed);
        if (read_ptr) {
            // does the following dependent read race?
            printf("%d\n", read_ptr->i);
        }
    });

    c_ptr.store(new C{rand()}, std::memory_order_release);

    return 0;
}

Godbolt

I am interested in whether reads through pointers need load-acquire semantics when loading the pointer, or whether the dependent-read nature of reads through pointers makes that ordering unnecessary. If it matters, assume arm64, and please describe why it matters, if possible.

I have tried searching for discussions of dependent reads and haven't found any explicit recognition of their implicit load-reordering-barriers. It looks safe to me, but I don't trust my understanding enough to know it's safe.

I'm not sure what's the question?. the data (pointer) acquire from `atomic` is thread safe. and you access 2 different pointer. (one is default constructed (uninitialized) atomic though) — apple apple, Nov 20 '22 at 18:28
Technically, yes. In practice, very unlikely to matter. I doubt that there's architecture where this particular case can cause problems without the release/acquire fences, though in more complicated case it definitely can make a big difference even in practice. The release/acquire memory fences indeed guarantee that `i` will be correctly read - that's their point that other data besides the value of pointer gets properly read/written. — ALX23z, Nov 20 '22 at 18:54
This is a duplicate of https://stackoverflow.com/questions/51468730/are-initialized-values-guaranteed-to-be-reflected-through-their-own-address-rega but apparently I'm not allowed to mark it as such. — Brian Bi, Nov 20 '22 at 22:31
@ALX23z RE "... without the release/acquire fences ..." I believe you absolutely need the write-release, without it you may read uninitialized memory instead of the rand() result. I believe this due to experience, and if you consider the reordering possibilities, the write to `i` may be reordered WRT to the write to c_ptr. — nmr, Nov 20 '22 at 22:42
@BrianBi the question you linked writes to the pointer with std::memory_order_relaxed, this question writes with std::memory_order_release. They are similar, though, so thanks for the link. — nmr, Nov 20 '22 at 22:47
Good point, although it doesn't change anything, because `memory_order_release` has no effect unless there's a matching `memory_order_acquire` or stronger (I am going to pretend `memory_order_consume` doesn't exist) — Brian Bi, Nov 20 '22 at 22:56
@BrianBi "memory_order_release has no effect unless there's a matching memory_order_acquire [...]" are you certain? Do you have a citation or anything? That seems like a big claim to me. — nmr, Nov 20 '22 at 22:58
@nmr in this there is the dependency of a newly allocated memory. This on its own carries certain memory implications. — ALX23z, Nov 21 '22 at 06:21
@BrianBi that `release` requires maching `acquire` to do something is a wrong thing to say. (1) The operation does certain things and has implications even without a matching `acquire`. (2) this is more of C++ memory model, in practice the operation might not actually do anything more than atomic relaxed operation on some architectures. — ALX23z, Nov 21 '22 at 06:26
@ALX23z: In practice C++ compilers implement `release` the same way regardless of context and without checking the whole program for the existence of a possible reader with at least `consume` ordering. But as far as the ISO C++ standard guaranteeing anything, no, there aren't any guarantees. There's no happens-before, so no guaranteed ordering in the C++ abstract machine. Only in some but not all concrete implementations into asm for certain machines by real compilers which chose not to look for that UB and break the code on purpose. — Peter Cordes, Nov 21 '22 at 16:12
@ALX23z It seems like what you're saying is that an *operation* that has release semantics still has an effect even if there's no matching acquire. That is true, of course. What I meant is that there's no difference between release and relaxed if there's no matching acquire, therefore there's no material difference between this question and the other one that I linked. — Brian Bi, Nov 21 '22 at 23:16
@BrianBi no, even without matching acquire, release operation does different operation than relaxed. It depends a lot on the hardware what exactly happens. But even without looking through hardware nuances, it adds various restrictions on compiler optimizations. — ALX23z, Nov 22 '22 at 04:53
@ALX23z I am speaking only from the perspective of what is guaranteed by the C++ standard. — Brian Bi, Nov 22 '22 at 14:49
@BrianBi even from perspective `release` imposes different restrictions than `relaxed`. Perhaps, these restrictions are not very usable unless an `acquire` is called but it affects how compiler can compile the code. — ALX23z, Nov 22 '22 at 15:42
@ALX23z If you think there's a difference, explain what it is and reference the standard. — Brian Bi, Nov 22 '22 at 16:37
@BrianBi say, it affects memory operation restrictions. With relaxed, compiler free to reorded operations, say you have `op(); x.store(1,relaxed);` post optimization can become `x.store(1,relaxed); op();` this is not possible with release operation. All `op()`'s writes would have to be scheduled prior to `x`'s store. — ALX23z, Nov 22 '22 at 21:49
@ALX23z And how would you observe whether or not such reordering had taken place? You would observe it by reading from x in another thread. And if that read is relaxed, then it might not see the results of `op()`, even if the store was release. — Brian Bi, Nov 22 '22 at 23:28
@BrianBi it might affect performance or cause code to malfunction. Say, `op()` takes a lot of time, and some other thread testing it with relaxed checks on the atomic variable. Once it's dicovered that the variable is true, then it starts performing something else. In such situation reordering will cause the code to malfunction / not work as intended. — ALX23z, Nov 23 '22 at 02:01
@ALX23z That can also happen with the release store. The side effects of `op()` are not guaranteed to be visible to the thread that performs the relaxed read, leading to a potential race condition. — Brian Bi, Nov 23 '22 at 04:21
@BrianBi no that cannot happen with `release` memory order, it forbids any store operation from `op()` to happen after the release. I am not talking about whether what `op()` does is visible - it is not relevant. I am talking about when the thread scheduled the operation. By reading on relaxed, the thread learns that if it performed an acquire read then `op()` operation would be visible. Sometimes visibility of changes is not necessary, just knowledge that they are ready. — ALX23z, Nov 23 '22 at 07:19
@BrianBi order of operations in the thread can be important when dealing with atomics. That's the whole point of `consume` that it acts as a fence on reads of the thread unlike relaxed while not commiting or updating anything that release and acquire do. — ALX23z, Nov 23 '22 at 08:51
@ALX23z The standard doesn't guarantee anything about the order in which operations are performed unless such order would be observable. It sounds like you agree that the order isn't observable unless there's a matching acquire operation. — Brian Bi, Nov 23 '22 at 12:36
@BrianBi observable and visible are two different things here. Visible refers to other threads seeing the changes. Here changing the order of operations is observable either way but it is permitted per memory order rules when relaxed operation is used. — ALX23z, Nov 23 '22 at 12:54
@ALX23z How exactly do you think you can observe the ordering? — Brian Bi, Nov 23 '22 at 13:26
@BrianBi the other thread that checked the variable, noticed it to be true, started running some heavy computing. And in task manager one can notice more cores getting 100%. That's pretty observable, no? — ALX23z, Nov 23 '22 at 13:34
@ALX23z That doesn't count as an observable effect in the standard. Performing I/O would be an observable effect, and if `op()` also performs I/O, it may be reordered after the release store. — Brian Bi, Nov 23 '22 at 13:48
@BrianBi modifying an atomic variable is *always* an observable behaviour with side effects. That was an obvious example why it is very important. Normally, optimizing or reordering any such operations would be forbidden as it is the case with volatile, however memory ordering rules do permit it. For that reason term "visible" was invented. — ALX23z, Nov 23 '22 at 14:05
@ALX23z It's a side effect, but it's not observable. I/O, in contrast, is observable. If one thread does I/O and then a release store, and the other thread reads the stored value through a relaxed load and then does I/O, the two I/O operations may occur in either order. — Brian Bi, Nov 23 '22 at 14:19
@BrianBi if another thread read a piece of data, how is it not observable behaviour? It's like you are trying to pretend that the whole C++ memory is not part of the standart and not observable. — ALX23z, Nov 23 '22 at 14:55
@ALX23z I fear that we are arguing about definitions at this point. My point is that, again, if thread 1 performs I/O followed by a release store to `x` and thread 2 performs a relaxed load on `x` that reads the stored value and then performs I/O, the two I/O operations can occur in either order. If you agree with this, it means I'm right that the release store does not prevent reordering. If you disagree, then explain why. — Brian Bi, Nov 23 '22 at 15:09
No, you have completely changed the topic and started talking about something else. The claim is that there is difference between release operation and relaxed operation even if there's no matching acquire. Literaly, that's what caused the argument saying that without acquire, release operation has no implications. — ALX23z, Nov 23 '22 at 15:50
@BrianBi as an example let me write some trivial code: `atomic x{0}, y{0};` thread1: `x.store(1,relaxed); y.store(1,release);` thread2: `if(y.load(relaxed) == 1){assert(x.load(relaxed)==1);}` if you change the release to relaxed the assert might trigger. And there is no matching acquire. — ALX23z, Nov 23 '22 at 15:53
@ALX23z That assert can trigger as-is. There will be no difference if the release is changed to relaxed. — Brian Bi, Nov 23 '22 at 16:18
with the release instruction. The release imposes that x store happens after y store. The if condition imposes that x load happens after y load. — ALX23z, Nov 23 '22 at 17:41
@ALX23z The compiler is allowed to reorder the stores because there is no synchronization relationship with the other thread. Therefore the assert can fire. A release store does not prevent such reordering UNLESS there is a matching acquire load. Please read the standard if you don't believe me. — Brian Bi, Nov 23 '22 at 21:26
@ALX23z Just because no compiler we know of can break your code example and make it assert, doesn't mean such compiler could not be 1) std conforming 2) programmable (not too hard to make) 3) useful in the real world. We aren't even talking about abstract academic exercices. First generations of compilers used to treat atomic ops as opaque fun calls, doing zero transformation around them, but now they do a lot of code transformations. — curiousguy, Nov 26 '22 at 04:10

score 3 · Accepted Answer · answered Nov 21 '22 at 16:21

Your code is not safe, and can break in practice with real compilers for DEC Alpha AXP (which can violate causality via tricky cache bank shenanigans IIRC).

As far as the ISO C++ standard guaranteeing anything in the C++ abstract machine, no, there's no guarantee because nothing creates a happens-before relationship between the init of the int and the read in the other thread.

But in practice C++ compilers implement release the same way regardless of context and without checking the whole program for the existence of a possible reader with at least consume ordering.

In some but not all concrete implementations into asm for real machines by real compilers, this will work. (Because they choose not to look for that UB and break the code on purpose with fancy inter-thread analysis of the only possible reads and writes of that atomic variable.)

DEC Alpha could famously break this code, not guaranteeing dependency ordering in asm, so needing barriers for memory_order_consume, unlike all(?) other ISAs.

Given the current deprecated state of consume, the only way to get efficient asm on ISAs with dependency ordering (not Alpha), but which don't do acquire for free (x86) is to write code like this. The Linux kernel does this in practice for things like RCU.

That requires keeping it simple enough that compilers can't break the dependency ordering by e.g. proving that any non-NULL read_ptr would have a specific value, like the address of a static variable.

Do dependent reads require a load-acquire?

1 Answers1

Your code is not safe, and can break in practice with real compilers for DEC Alpha AXP (which can violate causality via tricky cache bank shenanigans IIRC).

Linked