Least intrusive compile barrier for Java on x86

Question

If I hava a Java process interacting with some other process via a shared ByteBuffer or similar, what would be the least intrusive equivalent of a compiler barrier in C/C++? No portability is required - I am specifically interested in x86.

For example I have 2 processes reading and writing to an area of memory as per the pseudocode:

p1:
    i = 0
    while true:
      A = 0
      //Write to B
      A = ++i

p2:
    a1 = A
    //Read from B
    a2 = A

    if a1 == a2 and a1 != 0:
      //Read was valid

Due to the strict memory ordering on x86 (loads to separate locations not reorder and reads to separate locations not reordered) this does not require any memory barrier in C++, just a compile barrier between each write and between each read (i.e. asm volatile).

How can I achieve the same ordering constraint in Java in the least expensive manner. Is there anything less intrusive than writing to a volatile?

score 4 · Answer 1 · answered Feb 02 '13 at 09:47

4

You can use lazySet, it can be up to 10x faster than setting a volatile field as it doesn't stall the CPU pipeline. e.g. AtomicLong lazySet or you can use the Unsafe equivalent if you need.

answered Feb 02 '13 at 09:47

Peter Lawrey

525,659
79
751
1,130

BeeOnRope · Accepted Answer · 2013-02-07T02:11:55.880

sun.misc.Unsafe.putOrdered should do what you want - a store with the lock implied on x86 by volatile. The compiler will not move instructions around it, I believe.

This is the same as lazySet on AtomicInteger and friends, but that can't be used directly with ByteBuffer.

Unlike volatile or the AtomicThings classes, that method applies to the specific writes you use it on, and not the definition of the member, so using it doesn't imply anything for reads.

It looks like you are trying to implement something like a seqlock - meaning you need to avoid re-ordering between reads of the version counter, A, and the reads/writes of the data itself. A plain int isn't going to cut it - since the JIT might do all sorts of naughty things. My recommendation would be to use a volatile int for your counter, but then write it to it with putOrdered. This way, you don't pay the price for volatile writes (a dozen cycles or more, usually), while getting the compiler barrier implied by the volatile read (and the hardware barrier for those reads is a no-op, making them fast).

All that said, I think you are in a grey area here, because lazySet isn't a part of the formal memory model, and doesn't fit cleanly into the happens-before reasoning, so you need a deeper understanding of the actual JIT and hardware implementation to see if you can combine things in this way.

Finally, even with volatile reads and writes (ignoring lazySet), I don't think your seqlock is sound from point of view of the java memory model, because volatile writes only set up a happens-before between that write and later reads on another thread, and earlier actions in the writing thread, but not between the read and actions following the write on the writing thread. Said another way, it is a unidirectional fence, not a bidirectional one. I believe writes in version N+1 to your shared region can be seen by the reading thread even while it reads A == N twice.

Clarification from the comment:

Volatile only sets up a one way barrier. It is very similar to acquire/release semantics used by WinTel in some APIs. For example, assume A, Bv, and C all initially zero:

Thread 1:
A = 1;   // 1
Bv = 1;  // 2
C = 1;   // 3

Thread 2:

int c = C;  // 4
int b = Bv; // 5
int a = A;  // 6

Here, only Bv is volatile. The two threads are doing something similar in concept to your seqlock writers and readers - thread 1 writes some stuff in one order, and thread 2 reads the same stuff in a reverse order, and tries to reason about ordering from that.

If thread two has b == 1, then a == 1 always, because 1 happens-before 2 (program order), and 5 happens before 6 (program order), and most critically 2 happens before 5 since 5 read the value written at 2. So in this way the write and read of Bv is acting like a fence. Things above (2) cannot "move below" (2), and things below (5) cannot "move above" 5. Note I only restricted movement in one directly for each thread, however, not both, which brings us to our next example:

Equivalently to the above, you might assume that if a == 0, then c == 0 also, since C is written after a, and read before. However, volatiles don't guarantee this. In particular, the happens-before reasoning above doesn't prevent (3) from being moved above (2) as observed by thread 2, nor do they prevent (4) from being pushed below (5).

Update:

Let's look at your example specifically.

What I believe can happen is this, unrolling the write loop which occurs in p1.

p1:

i = 0
A = 0
// (p1-1) write data1 to B
A = ++i;  // (p1-2) 1 assigned to A

A=0  // (p1-3)
// (p1-4) write data2 to B
A = ++i;  // (p1-5) 2 assigned to A

p2:

a1 = A // (p2-1)
//Read from B // (p2-2)
a2 = A // (p2-3)

if a1 == a2 and a1 != 0:

Let's say p2 sees 1 for a1 and a2. This means there is a happens before between p2-1 and p1-2 (and by extension p1-1), and also between p2-3 and p1-2. However there is happens-before between anything in p2 and p1-4. So in fact, I believe the read of B at p2-2 can observe the second (perhaps partially completed) read at p1-4, which can "move above" the volatile writes at p1-2 and p1-3.

It's interesting enough that I think you might make a new question just on that alone - forget about faster barriers - does this work at all even with volatile?

Agreed on the JVM not moving instructions, but the CPU might otherwise. — Peter Lawrey, Feb 02 '13 at 09:48
Right, but it still has specific semantics - in particular not being reordered with other stores, which on x86 is as simple as a plain store, which is how putOrdered and lazySet are implemented on that platform. So the method should do what the OP wants. Of course, the exact semantics are thinly documented, so a double check of the JDK source might be in order. — BeeOnRope, Feb 02 '13 at 09:51
Thanks, this seems like it will fit pretty well. What about on the read side? — jmetcalfe, Feb 03 '13 at 06:53
I added more detail above to address that. I'm concerned that even with full volatile read/writes your seqlock won't work (in theory, I think it is likely to work in practice), because volatile reads on mean that you'll see everything that happened before the corresponding write, but do _not_ prevent you from seeing even more things that happened after that write. — BeeOnRope, Feb 04 '13 at 00:37
Thanks for the update. To confirm (I am not well-versed in Java so excuse my ignorance), you think the volatile read would only guarantee ordering with respect to earlier writes, not earlier reads, meaning that the read from B or the initial read from A could be reordered with the second read from A? So the only solution would be the original write to volatile? — jmetcalfe, Feb 04 '13 at 20:17
What I mean is like, well formatting in comments sucks so let me add it to the reply. — BeeOnRope, Feb 04 '13 at 21:25
@BeeOnRope Very helpful, thanks. Just to clarify again, it seems from the above that your suggestion to use putOrdered on the two counter writes _unless_ `B` is also written with putOrdered/volatile, because, using your example, the assignment of Bv at (2) could move above that of A at (1). So we must use putOrdered on all three writes. — jmetcalfe, Feb 05 '13 at 07:58
Actually I think I'm saying that even volatile isn't enough for your counter writes (A in your code). I didn't follow understand your question however. When you said "counter" did you mean your code or mine? When you said `B` did you mean `Bv` from my code? — BeeOnRope, Feb 05 '13 at 10:51
@BeeOnRope Lets take your example. If my understanding of the Java happens-before gaurentees is correct, making writes to A, Bv and C all volatile should gaurentee consistency? We have stopped (3) moving above (2) and I believe thread 2 sees c=1 and a=1 he can gaurentee Bv=1. The counter I referred to is A in my original example. — jmetcalfe, Feb 06 '13 at 07:30
Well in my example, thead 2 is looking at Bv and trying to reason about the validity of a and c, not the other way around. Basically a and c represent your "data" and Bv is your "A" variable - the seqlock version counter (confusing, sure). Let's abandon by example and look at yours directly. Because formatting sucks, I will put my analysis in the post. — BeeOnRope, Feb 07 '13 at 01:57
The user @irreputable seems to know what is going on with this kind of stuff. See for example his reply on [this question](http://stackoverflow.com/questions/3971095/volatile-variables-and-happens-before-ordering), which indicates an exhaustive approach you might use to analyze this. — BeeOnRope, Feb 07 '13 at 02:29
I asked about the general case of a seqlock using volatiles [here](http://stackoverflow.com/questions/14742808/is-it-possible-to-efficiently-implement-a-seqlock-in-java). — BeeOnRope, Feb 07 '13 at 03:08

Least intrusive compile barrier for Java on x86

2 Answers2

Linked