Why release fences added to Scala collections in 2.13 are enough in absence of matching acquire fences?

Question

The process of creation of some of Scala immutable collections, most notably List, is mutable. In 2.13 the concurrency issues were addressed by adding a release fence basically to every builder. A release fence prevents memory writes coming after the fence from being reordered before the fence - so, if you have a 'final write' (such as 'payload_ready' in many examples), it means that anything you wrote (and read) before is commited to shared memory (or at least visible to other threads). When I was reading on fences, I came under impression that a release fence needs a matching acquire fence in the code reading the data commited by the other thread. I accepted it as it coincided with an old adage that no synchronisation will work if it happens only at one site (vide the famous 'double-check locking/singleton pattern is broken' warning). I know that Java's (and C++) memory model is more abstract than any contemporary physical architecture, so some aspects of it are irrelevant.

I freely admit that I have no experience with such low-level control as memory barriers, so I do not feel comfortable with using them. In particular, I can't see how writes being reordered after a release fence (which, as I understand, is still possible) is any different from writes from after the fence actually happening before it, the latter being forbidden. The fence location is irrelevant, what matters is the happens-before relationship. I would like to understand why the lone release fence is enough, or ''when'' it is enough, and when in that case an acquire fence will still be needed.

score 3 · Answer 1 · answered Feb 10 '22 at 16:28

It looks like the releaseFence appeared in List code as a result of these discussions: 1 and 2.
To be fair with you, I didn't find there clear explanations why only releaseFence was added (without accompanying acquireFence).

So I don't know the real answer, but I have a version.

As I understand, releaseFence alone could be enough.
As I've read every CPU supported by Java preserves a so called load dependency.
It looks like load dependency was used to implement final JMM guarantees (see this article), so I guess, Scala might have used the same trick.

Example of load dependency:

class Obj {
  int a = 0;
}

Obj obj = null;

void thread1() {
  var o = new Obj();
  o.a = 1;
  VarHandle.releaseFence();
  obj = o;
}

void thread2() {
  Obj o;
  do {
    o = obj;
  } while (o == null)
  var r1 = o.a;
}

In thread1(): releaseFence guarantees that o.a = 1 and obj = o aren't reordered.
In thread2() Read var r1 = o.a means that we compute memory address for loading as [memory address of object o] + [offset of field a inside Obj].
This is a load dependency: memory address for load o.a is computed from memory address of o loaded by o = obj.
CPU preserving load dependency means that o = obj and var r1 = o.a aren't reordered.
Since in thread1() o.a = 1 and obj = o aren't reordered either, then read var r1 = o.a in thread2() is guaranteed to see 1.

Note that this works because it's one object:
- when the value of obj changes => we know that thread1() is done
- when we read inner fields of obj we get load dependency => we are guaranteed to see all the writes from thread1() which are before obj = o

Here is an example when we need acquireFence:

int a = 0;
int ready = false;

void thread1() {
  a = 1;
  VarHandle.releaseFence();
  ready = true;
}

void thread2() {
  while (!ready) {;}
  VarHandle.acquireFence(); // required in this case
  var r1 = a;
}

Here there is no load dependency between the reads of a and ready in thread2().
As a result acquireFence is required to prevent reordering of the reads of a and ready.

Keep in mind that this is just a version.
As you correctly noted, this all is very low level. I don't know Java on such a deep level. (BTW as I understand fences are outside the JMM + JMM is defined only for Java (not Scala or JVM) - so I'm not sure that happens-before can be applied here, at least as it's defined in the JMM)

Yup, this makes some sense. In C++ we'd need `std::memory_order_consume` to make this safe, because C++ is portable to DEC Alpha. And more importantly, because C++ wants compilers to be able to optimize away things like `x - x` to a `0` without a dependency on the original value in normal cases - it's compiler optimization that's the main challenge for taking advantage of dependency ordering. (This is a hard problem: current C++ compilers gave up and treat `consume` as a full `acquire`, temporarily deprecating `consume`.) — Peter Cordes, Feb 10 '22 at 17:58
But in practice Linux RCU does use this, and the trick is just to write code that doesn't do anything that would let a compile remove a data dependency. I assume real-world Java can get away with doing this, too. (See also [\[\[carries\_dependency\]\] what it means and how to implement](https://stackoverflow.com/q/64113244) for more about C++) — Peter Cordes, Feb 10 '22 at 17:58

Why release fences added to Scala collections in 2.13 are enough in absence of matching acquire fences?

1 Answers1