Lock implementation deadlock

Question

I have two threads, one with id 0, and the other with id 1. Based on this information I tried to construct a lock, but it seems like it deadlocks, but I don't understand why, can someone please help me?

I tried to construct a scenario when this happens, but it is really difficult

  private int turn = 0;
  private boolean [] flag = new boolean [2];
   
  public void lock(int tid) {
    int other;
 
    other = (tid == 0) ? 1 : 0;

    flag[tid] = true;
    while (flag[other] == true) {
      if (turn == other) {
        flag[tid] = false;
        while (turn == other){}
        flag[tid] = true;
      }
    }
  }
 
  public void unlock(int tid) {
    //turn = 1-t;
    turn = (tid == 0) ? 1 : 0;
    flag[tid] = false;
  }

The program is at least missing a `volatile` on `int turn = ...`. Not sure, however, if this alone fixes the issue. — Turing85, Sep 23 '21 at 15:26
I don't think you've got a deadlock, I think your threads are just not seeing the updates from the other. — Andy Turner, Sep 23 '21 at 15:26
@AndyTurner what do you mean that they are not seeing each others updates, do you mean updates to the flag[] array? :3 — August Jelemson, Sep 23 '21 at 15:34
I think make `flag` and `turn` variables static so both threads will see same values. if you provide entire code will be useful to understand — the Hutt, Sep 23 '21 at 15:34
@AugustJelemson I mean, according to the Java Memory Model, there is no reason for thread `tid` to check to see whether `turn`, `flag[0]` or `flag[1]` have been updated by thread `other`. — Andy Turner, Sep 23 '21 at 15:35
Turing85, AndyTurner, onkarruikar I added volatile to the flags[] array and the turn-variable, now it seems to be working? does this have anything to do with the threads now seeing each others updates? — August Jelemson, Sep 23 '21 at 15:38

score 5 · Accepted Answer · answered Sep 23 '21 at 15:38

Yup, this code is broken. It's.. a complicated topic. If you want to know more, dive into the Java Memory Model (JMM) - that's a term you can search the web form.

In basis, every field in a JVM has the following property, unless it is marked volatile:

Any thread is free to make a local clone of this field, or not.
The VM is free to sync these clones up, in arbitrary fashion, at any time. Could be in 5 seconds, could be in 5 years. The VM is free to do whatever it wants, the spec intentionally does not make many guarantees.
To solve this problem, establish HB/HA (A Happens-Before/Happens-After relationship).

In other words, if 2 threads both look at the same field, and you haven't established HB/HA, your code is broken, and it's broken in a way that you can't reliably test for, because the JVM can do whatever it wants. This includes working as you expect during tests.

To establish HB/HA, you need volatile/synchronized, pretty much. There's a long list, though:

x.start() is HB to the first line in that thread's HA.
Exiting a synchronized(x) {} block is HB to the start of a synchronized (x) {} block, assuming both x's are pointing at the same object.
Volatile access establishes HB/HA, but you can't really control in which direction.
Note that many library calls, such as to ReadWriteLock and co, may well be synchronizing/using volatile under the hood, thus in passing establishing some form of HB/HA.
"Natural" HB/HA: Within a single thread, a line executed earlier is HB to a line executed later. One might say 'duh' on this one.

Note that HB/HA doesn't actually mean they actually happen before. It merely means that it is not possible to observe the state as it was before the HB line from the HA line, other than by timing attacks. This usually doesn't matter.

The problem

Your code has two threads that both look at the same field, does not contain volatile or synchronized or otherwise establishes HB/HA anywhere, and doesn't use library calls that do so. Therefore, this code is broken.

The solution is to introduce that someplace. Not sure why you're rolling your own locks - the folks who wrote ReadWriteLock and friends in the java.util.concurrent package are pretty smart. There is zero chance you can do better than them, so just use that.

Darn you beat me at it! ;-) Just plus-ed your answer! Maybe add a [link to the JMM](https://en.wikipedia.org/wiki/Java_memory_model)? (I would be happy to find a better link, except for the [Java Concurrency in Practice](https://jcip.net/) one. — Emmef, Sep 23 '21 at 16:25
@AugustJelemson I don't think adding `volatile` is a guarantee here; that makes `arr =...` updates volatile, I don't think it applies to individual array elements; I don't think you _can_ make those volatile. — rzwitserloot, Sep 23 '21 at 18:00
you are right: **volatile** _only_ refers to the variable referencing the array, not its elements. See, for example [this answer](https://stackoverflow.com/a/5173642/4121151). — Emmef, Sep 27 '21 at 11:17

pveentjer · Answer 2 · 2021-09-27T08:07:43.320

Dekker's algorithm is a great example to explain sequential consistency.

Lets first give you a bit of background information. When a CPU does a store, the CPU won't wait for the store to be inserted in the (coherent) cache; once it is inserted into the cache it is globally visible to all other CPUs. The reason why a CPU doesn't wait is that it can be a long time before the cache line that contained the field you want to write is in the appropriate state; the cache line first needs to be invalidated on all other CPU's before the write can be performed on the cache-line and this can stall the CPU.

To resolve this problem modern CPUs make use of store buffers. So the store is written to the store buffer and at some point in the future, the store will be committed to the cache.

If the store is followed by a load to a different address, it could be that the store and the load are reordered. E.g. the cache line for the later load might already be in the right state while the cache line for the earlier store isn't. So the load is 'globally' performed before the store is 'globally' performed; so we have a reordering.

Now lets have a look at your example:

 flag[tid] = true;
 while (flag[other] == true) {

So what we have here is a store followed by a load to a different address and hence the store and the load can be reordered and the consequence is that the lock won't provide mutual exclusion since 2 threads at the same time could enter the critical section.

[PS] It is likely that the 2 flags will be on the same cache line, but it isn't a guarantee.

This is typical what you see on the X86; an older store can be reordered with a newer load. The memory model of the X86 is called TSO. I will not go into the details because there is an optimization called store to load forwarding which complicates the TSO model. TSO is a relaxation of sequential consistency.

Dekker's and Peterson's mutex algorithms do not rely on atomic instructions like a compare and set; but they do rely on a sequential consistent system and one of the requirement of sequential consistency is that the loads and stores do not get reordered (or at least nobody should be able to proof they were reordered).

So what the compiler needs to do is to add the appropriate CPU memory fence to make sure that the older store and the newer load to a different address don't get reordered. And in this case that would be an expensive [StoreLoad] fence. It depends on the ISA, but on Intel X86 the [StoreLoad] will cause the load-store-unit stop executing loads till the store buffer has been drained.

You can't declare the fields of an array to be volatile. But there are a few options:

don't use an array, but explicit fields.
use an AtomicIntegerArray
start messing around with fences using the VarHandle. Make sure the fields are accessed using a VarHandle with at least memory order opaque to prevent the compiler to optimize out any loads/stores.

Apart from what I said above, it isn't only the CPU that can cause problems. Because there is no appropriate happens before edge between writing and reading the flags, the JIT can also optimize this code.

For example:

if(!flag[other])
   return;

while (true) {
      if (turn == other) {
        flag[tid] = false;
        while (turn == other){}
        flag[tid] = true;
      }
    }

And now it is clear that this lock could wait indefinitely (deadlock) and doesn't need to see change in the flag. More optimizations are possible, that could totally mess up your code.

if (turn == other) {
      while (true) {}
}

If you add the appropriate happens before edges, the compiler will add:

compiler fences
CPU memory fences.

And Dekker's algorithm should work as expected.

I would also suggesting setting up a test using JCStress. https://github.com/openjdk/jcstress and have a look at this test: https://github.com/openjdk/jcstress/blob/master/jcstress-samples/src/main/java/org/openjdk/jcstress/samples/concurrency/mutex/Mutex_02_DekkerAlgorithm.java

Lock implementation deadlock

2 Answers2

The problem