Happens-before for direct ByteBuffer

Question

I have a direct ByteBuffer (off-heap) in one thread and safely publish it to a different thread using one of the mechanisms given to me by JMM. Does the happens-before relationship extend to the native (off-heap) memory wrapped by the ByteBuffer? If not how can I safely publish the contents of a direct ByteBuffer from one thread to a different one?

Edit

This is not a duplicate of Can multiple threads see writes on a direct mapped ByteBuffer in Java? because

I am not talking about an mmaped() region but a general off-heap area
I am safely publishing the ByteBuffer
I am not concurrently modifying the contents of the ByteBuffer, I am just hading it from one thread to a different one

Edit 2

This is not a duplicate of Options to make Java's ByteBuffer thread safe I am not trying to concurrently modify a ByteBuffer from two different threads. I am trying to hand if over from one thread to a different one and get happens-before semantics on the native memory region backed by a direct ByteBuffer. The first thread will no longer modify or read from the ByteBuffer once it has been handed over.

I am not talking about an mmap()ed area and I safely publish the ByteBuffer. — Philippe Marschall, Nov 01 '17 at 08:42
If you synchronize the threads on a monitor then that acts as a memory fence i.e. all the writes before the sync will be visible in another thread after the sync. — rustyx, Nov 01 '17 at 09:00
@RustyX are you sure that applies to off-heap regions as well? — Philippe Marschall, Nov 01 '17 at 09:01
[This answer](https://stackoverflow.com/a/11154553/1466267) quotes the [`Buffer` javadoc](https://docs.oracle.com/javase/9/docs/api/java/nio/Buffer.html) which states that *"Buffers are not safe for use by multiple concurrent threads. If a buffer is to be used by more than one thread then access to the buffer should be controlled by appropriate synchronization."* — SpaceTrucker, Nov 01 '17 at 09:34
@RustyX I'm not sure as well, that's why I'm looking for an authoritative answer. Where do you take the confidence from that mutex acquire/release semantics should apply should apply to the native memory region backed by a direct ByteBuffer? — Philippe Marschall, Nov 01 '17 at 10:03
@SpaceTrucker that's not my use case, I am not concurrently modifying the buffer, I am hading it over from one thread to a different one. The first thread will no longer modify it or read from it. I am not looking for a way to concurrently modify a ByteBuffer. I am looking for a way to get happens-before relationship semantic on the native memory region backed by a direct ByteBuffer. — Philippe Marschall, Nov 01 '17 at 10:06
@PhilippeMarschall Then maybe you are better off with just handing over a producer for the buffer to the thread that is reading from the buffer than the already created buffer. — SpaceTrucker, Nov 01 '17 at 10:24
This question should be reopened because it is substantially different than the linked duplicates as stated by the OPs edits. — SpaceTrucker, Nov 01 '17 at 10:30
@PhilippeMarschall - you should clarify _how_ you are writing to the byte buffer: is it via Java code, or do you have some native code that you call? — BeeOnRope, Nov 02 '17 at 00:12

BeeOnRope · Accepted Answer · 2017-11-01T23:14:31.407

Certainly if you read and write the ByteBuffer in Java code, using Java methods such as put and get, then the happens-before relationship between your modifications on the first thread, publishing/consumption, and finally subsequent access on the second thread will apply⁰ in the expected way. After all the fact that the ByteBuffer is backed by "off heap" memory is just an implementation detail: it doesn't allow the Java methods on ByteBuffer to break the memory model contract.

Things get a bit hazy if you are talking about writes to this byte buffer from native code you call through JNI or another mechanism. I think as long as you are using normal stores (i.e., not non-temporal stores or anything which has weak semantics than normal stores) in your native code, you will be fine in practice. After all the JMV internally implements stores to heap memory via the same mechanism, and in particular the get and put-type methods will be implemented with normal loads and stores. The publishing action, which generally involves some type of release-store will apply to all prior Java actions and also the stores inside your native code.

You can find some expert discussion on the concurrency mailing lists of more or less this topic. The precise question there is "Can I use Java locks to protect a buffer accessed only by native code", but the underlying concerns are pretty much the same. The conclusion seems consistent with the above: if you are safe if you do normal loads and stores to a normal¹ memory area. If you want to use weaker instructions you'll need a fence.

⁰ So that was a bit of a lengthy, tortured sentence, but I wanted to make it clear that there is a whole chain of happens-before pairs that have to be correctly synchronized for this to work: (A) between the writes to the buffer and the publishing store on the first thread , (B) the publishing store and the consuming load (C) the consuming load and the subsequent reads or writes by the second thread. The pair (B) is purely in Java-land so follows the regular rules. The question is then mostly about whether (A) and (C), which have one "native" element, are also fine.

¹ Normal in this context more or less means the same type of memory area that Java uses, or at least one with as-strong consistency guarantees with respect to the type of memory Java uses. You have to go out of your way to violate this, and because you are using ByteBuffer you already know the area is allocated by Java and has to play by the normal rules (since the Java-level methods on the ByteBuffer need to work in a way consistent with the memory model, at least).

score 2 · Answer 2 · answered Nov 01 '17 at 21:02

The Java object monitor's happens-before order semantics are described in §17.4.5 as:

The wait methods of class Object (§17.2.1) have lock and unlock actions associated with them; their happens-before relationships are defined by these associated actions.

It is unspecified whether that applies to Java-managed objects only or to any data. After all, Java doesn't care about what happens outside the Java "world". But it also means we can extrapolate the spec to any data reachable inside the Java world. Then the relation to the heap becomes less important. After all, if I synchronize the threads, why shouldn't it work for a direct ByteBuffer?

To confirm this we can take a look at how it is actually implemented in the OpenJDK.

If we look closely we see that ObjectMonitor::wait, among other things does:

    OrderAccess::fence();

And ObjectMonitor::exit (the business end of notify/notifyAll) does:

    OrderAccess::release_store_ptr (&_owner, NULL) ;
    OrderAccess::storeload() ;

Both fence() and storeload() result in a global StoreLoad memory fence:

inline void OrderAccess::storeload()  { fence(); }

On SPARC it generates the membar instruction:

  __asm__ volatile ("membar  #StoreLoad" : : :);

And on x86 it goes to membar(Assembler::StoreLoad) and subsequently:

  // Serializes memory and blows flags
  void membar(Membar_mask_bits order_constraint) {
    if (os::is_MP()) {
      // We only have to handle StoreLoad
      if (order_constraint & StoreLoad) {
        // All usable chips support "locked" instructions which suffice
        // as barriers, and are much faster than the alternative of
        // using cpuid instruction. We use here a locked add [esp],0.
        // This is conveniently otherwise a no-op except for blowing
        // flags.
        // Any change to this code may need to revisit other places in
        // the code where this idiom is used, in particular the
        // orderAccess code.
        lock();
        addl(Address(rsp, 0), 0);// Assert the lock# signal here
      }
    }
  }

So there you have it, it's just a memory barrier at CPU level. Reference counting and garbage collection come into play at a much higher level.

Which means that at least in OpenJDK, any memory write issued before Object.notify will be sequenced before any read issued after Object.wait.

as much as I like this - these *are* implementation details that I would not rely on. The only thing to rely on is the JLS. For example the rule about `final` fields says that *every* field has to be final in the JLS for a constructor to safely publish; but the actual implementation cares for *at least one* only — Eugene, Nov 02 '17 at 20:25

Happens-before for direct ByteBuffer

2 Answers2