Peter's answer is right. I just wanted to address a possible misconception behind the question.
You have to avoid reading too much into the name of the "happens before" relation. Formally, HB is just an arbitrary name given to a relation that may or may not hold between any given operation and another, following certain rules and having certain consequences. It doesn't necessarily correspond to events physically happening inside the machine.
Treat it like a mathematical definition, because that's what it is. You can only reason based on its precise formal definition, and not based on the everyday meaning of the English words "happens" and "before".
It sounds like you're interpreting "A happens before B" as meaning "A will be observed before B in all scenarios", but its implications are much more limited than that. It is true that y.store()
happens before x.store()
, and you can prove it from the definitions given in the C++ standard, simply because they are sequenced in that order within a particular thread (in other words, "happens before" is always consistent with program order). But it does not follow that the assert can't fail.
To get from facts about the HB relation to deductions about the actual behavior of the program, you have to make use of the coherence rules in the standard [intro.races p13-18], and these only let you draw conclusions about two relationships on the same object. They can be summarized as: if A and B are operations on the same object, and A happens before B, then B will observe the result of A. In this case, the behavior of A and B with respect to each other is "what you think it should be", as if they were executing in program order on a strictly in-order machine from the 1980s.
So in order to show that the assert can't fail, what you would need to prove is not that y.store()
happens before x.store()
; rather you would need to prove that y.store()
happens before y.load()
. And that, you will not be able to do.
Roughly speaking, the only way to link an HB relation between two threads is via synchronization, which usually results from an acquire load observing the result of a release store. You do know that the last x.load()
in the loop must observe the result of the x.store()
, but since the load is not acquire it does not help us.
The operations on y
are both seq_cst
and so in particular the store and load are respectively release and acquire, but in order to draw any conclusions from that we would have to know that y.load()
observes the result of y.store()
, which is exactly what we don't know and are trying to find out.
The bit about synchronization gives a hint about the fix: the load of x
should be upgraded to acquire or stronger. Alternatively, you can leave it as relaxed and place an acquire fence between the loop and the y.load()
, which roughly has the effect of retroactively upgrading the final x.load()
to acquire.