1

The process of creation of some of Scala immutable collections, most notably List, is mutable. In 2.13 the concurrency issues were addressed by adding a release fence basically to every builder. A release fence prevents memory writes coming after the fence from being reordered before the fence - so, if you have a 'final write' (such as 'payload_ready' in many examples), it means that anything you wrote (and read) before is commited to shared memory (or at least visible to other threads). When I was reading on fences, I came under impression that a release fence needs a matching acquire fence in the code reading the data commited by the other thread. I accepted it as it coincided with an old adage that no synchronisation will work if it happens only at one site (vide the famous 'double-check locking/singleton pattern is broken' warning). I know that Java's (and C++) memory model is more abstract than any contemporary physical architecture, so some aspects of it are irrelevant.

I freely admit that I have no experience with such low-level control as memory barriers, so I do not feel comfortable with using them. In particular, I can't see how writes being reordered after a release fence (which, as I understand, is still possible) is any different from writes from after the fence actually happening before it, the latter being forbidden. The fence location is irrelevant, what matters is the happens-before relationship. I would like to understand why the lone release fence is enough, or ''when'' it is enough, and when in that case an acquire fence will still be needed.

Turin
  • 2,208
  • 15
  • 23

1 Answers1

3

It looks like the releaseFence appeared in List code as a result of these discussions: 1 and 2.
To be fair with you, I didn't find there clear explanations why only releaseFence was added (without accompanying acquireFence).


So I don't know the real answer, but I have a version.

As I understand, releaseFence alone could be enough.
As I've read every CPU supported by Java preserves a so called load dependency.
It looks like load dependency was used to implement final JMM guarantees (see this article), so I guess, Scala might have used the same trick.

Example of load dependency:

class Obj {
  int a = 0;
}

Obj obj = null;

void thread1() {
  var o = new Obj();
  o.a = 1;
  VarHandle.releaseFence();
  obj = o;
}

void thread2() {
  Obj o;
  do {
    o = obj;
  } while (o == null)
  var r1 = o.a;
}
  • In thread1(): releaseFence guarantees that o.a = 1 and obj = o aren't reordered.

  • In thread2() Read var r1 = o.a means that we compute memory address for loading as [memory address of object o] + [offset of field a inside Obj].
    This is a load dependency: memory address for load o.a is computed from memory address of o loaded by o = obj.
    CPU preserving load dependency means that o = obj and var r1 = o.a aren't reordered.
    Since in thread1() o.a = 1 and obj = o aren't reordered either, then read var r1 = o.a in thread2() is guaranteed to see 1.

    Note that this works because it's one object:

    • when the value of obj changes => we know that thread1() is done
    • when we read inner fields of obj we get load dependency => we are guaranteed to see all the writes from thread1() which are before obj = o

Here is an example when we need acquireFence:

int a = 0;
int ready = false;

void thread1() {
  a = 1;
  VarHandle.releaseFence();
  ready = true;
}

void thread2() {
  while (!ready) {;}
  VarHandle.acquireFence(); // required in this case
  var r1 = a;
}

Here there is no load dependency between the reads of a and ready in thread2().
As a result acquireFence is required to prevent reordering of the reads of a and ready.

Keep in mind that this is just a version.
As you correctly noted, this all is very low level. I don't know Java on such a deep level. (BTW as I understand fences are outside the JMM + JMM is defined only for Java (not Scala or JVM) - so I'm not sure that happens-before can be applied here, at least as it's defined in the JMM)

  • 1
    Yup, this makes some sense. In C++ we'd need `std::memory_order_consume` to make this safe, because C++ is portable to DEC Alpha. And more importantly, because C++ wants compilers to be able to optimize away things like `x - x` to a `0` without a dependency on the original value in normal cases - it's compiler optimization that's the main challenge for taking advantage of dependency ordering. (This is a hard problem: current C++ compilers gave up and treat `consume` as a full `acquire`, temporarily deprecating `consume`.) – Peter Cordes Feb 10 '22 at 17:58
  • 1
    But in practice Linux RCU does use this, and the trick is just to write code that doesn't do anything that would let a compile remove a data dependency. I assume real-world Java can get away with doing this, too. (See also [\[\[carries\_dependency\]\] what it means and how to implement](https://stackoverflow.com/q/64113244) for more about C++) – Peter Cordes Feb 10 '22 at 17:58