4

Update: When I first posted this, I was fairly certain the code was broken. Now, I'm no longer sure about what I'd observed. The biggest problem I'm having is that I can't seem to apply 17.4. Memory Model and state straight out whether it should or shouldn't work.


This following code is broken.

It's overly complex for what it's trying to achieve, but furthermore, it's thread-unsafe in that I've observed that it can indefinitely wait at c. I'm not worried about the former (one could use a ReentrantLock or CountDownLatch for sounder code), but I'm wondering, what's the reason for the latter?

static final ConcurrentHashMap<Integer, Object> mutex = new ConcurrentHashMap<>();


public static brokenFoo() {

    Object ourLock = new Object();

    for (;;) {
        Object theirLock = mutex.putIfAbsent(0, ourLock);
        if (theirLock == null) {
            break;
        }

        synchronized (theirLock) {                  // a
            if (mutex.get(0) != theirLock) {        // b
                continue;
            }
            theirLock.wait();                       // c
        }                                           // d
    }

    try {
        // critical section

    } finally {
        synchronized (ourLock) {                    // e
            mutex.remove(0);                        // f
            ourLock.notifyAll();                    // g
        }                                           // h
    }
}

I've thought in terms of happens-befores:

  • hb(f, h) and hb(h, a) therefore hb(f, a)
  • hb(c, d) and hb(d, e) therefore hb(c, e)

But, this doesn't seem to prove or disprove anything.

Edit: (Above question fails to really explain what this code should do.)

Expected:

  • brokenFoo() is called by multiple threads and the above code is supposed to provide mutual exclusion over // critical section.
  • If two or more threads enter brokenFoo() at the same time, only one should proceed to // critical section, while others wait somewhere prior.
  • After the thread in // critical section has exited, another should proceed to take its place.

Actual:

  • It's been observed that there're threads that are waiting at c even though no other threads are in brokenFoo().
antak
  • 19,481
  • 9
  • 72
  • 80
  • Can you explain what you want to achieve with this code? – Sotirios Delimanolis Jun 12 '14 at 02:45
  • With the above code, I want to illustrate in the most simplest way possible the aforementioned problem, so that I can get an input on what, in terms of mutual exclusion and memory consistency, is wrong with it. This is so that I can better my understanding of how `synchronized` works and doesn't work, and what the Java memory model stipulates, such that I can apply this general knowledge in future cases. – antak Jun 12 '14 at 02:58
  • Yeah, but what are your expectations? This method is invoked by multiple threads? And what should it do? What does it actually do? It just blocks on the `wait()` at `c`? – Sotirios Delimanolis Jun 12 '14 at 03:00
  • Ah sorry, I see your point. I've amended the question. – antak Jun 12 '14 at 03:16
  • The only way this could happen is if the compiler reordered the remove and notify calls - which it is theoretically allowed to do I think (is it? Depends on the implementation), but I'm surprised that it can. – Voo Jun 12 '14 at 09:44
  • You say you aren't worried about the complexity of the example, but the complexity makes it hard to reason about it. Here's one thing you could do to simplify it in a big way. Use an AtomicBoolean instead of using the ConcurrentHashMap. – Solomon Slow Jun 12 '14 at 13:53
  • You can use a primitive like `AtomicBoolean` to understand how mutual exclusion works, but I don't think you will learn much by using `wait` and `notify` to understand how wait and notify work. AtomicBoolean methods use special synchronization instructions that are provided by the CPU, but wait and notify rely on system calls that interact with the operating system scheduler. You can write programs that demonstrate what those methods _do_, but in order to show _how_ they do it, you will pretty much have to write a simulation of an operating system in Java. – Solomon Slow Jun 12 '14 at 14:00
  • 1
    @Voo: reordering `remove` and `notify` should not have any notable effect as these operations are within a `synchronized` block and hence *atomic* for other threads syncing on the same object. – Holger Jun 12 '14 at 14:07
  • @antak: I can’t reproduce your problem. Are you sure it happens with *exact* code as it is posted in your question? – Holger Jun 12 '14 at 14:22
  • @Holger Right you are, I shouldn't comment on concurrency things before my first coffee. But I can't see any bug in it otherwise and also fail to reproduce it myself. – Voo Jun 12 '14 at 14:34
  • @jameslarge Indeed, I initially changed `ConcurrentHashMap` to `AtomicBoolean`, `boolean[1]` and `volatile boolean[1]` in attempt to reason things out, but I couldn't work it out so I changed it back, in case the conversion lost the flaw. – antak Jun 12 '14 at 14:43
  • @Holger The only major difference is that the original code was closer to `brokenFoo(final int key)`, and `key` was used in place of `0` on the map. Each `key` value would collide with about a dozen other threads, then, rarely be reused afterwards. Also, `wait()` was wrapped in `try-catch` to die on interrupt. – antak Jun 12 '14 at 14:59
  • @antak Aaand there we go. If your keys are larger than 8 byte signed you have a problem, because only integers in that range are cached. Meaning `synchronized(128){}` is a no op, since it's equivalent to `synchronized(new Integer(128))` – Voo Jun 12 '14 at 15:02
  • @Voo: the code is syncing on the *value*, not on the key. And `ConcurrentHashMap` is supposed to use `equals`. I wish it was that simple… – Holger Jun 12 '14 at 15:14
  • Original code was `ConcurrentHashMap` and `key` was `long`, if that makes a difference. – antak Jun 12 '14 at 15:25
  • I think I could be wrong about my observation. The problem is "fixed" in production so I can't attempt reproduction, but I can't count out the possibility that there were (stuck) threads already within `// critical section`. – antak Jun 13 '14 at 03:03

1 Answers1

1

It might be the case that one thread calls notifyAll() before another thread starts to wait(). This may happen due to a spurious wakeup:

  • thread 1 enters the critical section
  • thread 2 starts to wait()
  • a spurious wakeup occurs in thread 2
  • thread 1 enters the synchronized block and notifies on the lock
  • thread 2 enters the synchronized block and waits indefinitely

Or thread 1 just happened to execute before thread 2. While your code is correct in terms of the JMM, its liveness is not guaranteed. That's why you should use a CountDownLatch instead of the notify/wait mechanism.