Will the Java Memory Model permit the reordering of non-synchronized accesses of many atomic/volatile variables?

Question

I want to know if the JMM will permit an implementation to reorder the access to the ai and ada variables to behave differently than the intention shown in the code.

The intention is that the method perform ops is the next:

gets the next index of an array of volatile items
writes a number in the newly obtained index
performs a spinlock waiting to be sure that older indexes of the array got it's value
prints the sum of the array from the first index to the current obtained one.

What happens after the index reached 1000 is non-important to me. I actually want to use the array as a ring, but if this question gets answered I will be able to figure out how to do it.

Basically I want to avoid locks and rely on atomic and lock-free objects. I don't know however if in this specific case I still need implicit synchronization.

AtomicInteger ai = new AtomicInteger(0);
// let's say all values are null by default
AtomicDoubleArray ada = new AtomicDoubleArray(1000); 
int rn = 4; // dice rolled...

void perfomOps(AtomicInteger ai, AtomicDoubleArray ada) {
   int i = ai.getAndIncrement();
   ada.set(i, rn); 
   spinlock(ada, i);
   printSum(ada, i);
}

void spinlock(AtomicDoubleArray ada, int idx) {
// spinlock in case the other threads couln't write to older indexes yet
   if (idx > 0)
      for (int c = 0;c < idx; c++) 
         while (i = ada.get(c) == null);
}

void printSum(AtomicDoubleArray ada, int idx) {
   double sum = 0;
   for (int i = 0;i < idx; i++)
      sum = sum + ada.get(i);
   Sytem.out.println(sum);
}

// thread 1
perfomOps(ai, ada); // lets say we get 4 printed

// thread 2
perfomOps(ai, ada); // lets say we get 8 printed

// thread 3
perfomOps(ai, ada); // lets say we get 12 printed

Other threads will keep doing the same, it can happen that the print order is not as expected, like 8, 12 and then 4. But assuming a next thread comes in, then it will see correct values and correctly print now 16.

All this fancy stuff is to avoid making explicit locks in order to measure which one performs better, the lockfree version or the synchronized one

AtomicInteger and AtomicDoubleArray has the same memory effects of reading a volatile variables — David Hofmann, Apr 07 '14 at 21:25
Uhm, they may have the same memory effects but they don't have the same semantics _at all_. An `Atomic*()` reader or writer needs to hold exclusive access to it, this is not the case for `volatile`. — fge, Apr 07 '14 at 21:31
my understanding is that performOps function itself won't be atomic without a locking mechanism as the function could be interrupted between the two atomic calls. — Dave S, Apr 07 '14 at 21:31
It doesn't explicitly say it is, but does for several other functions. — Dave S, Apr 07 '14 at 21:32
@fge Atomic* rely on CAS. they don't require exclusive acess AFAIK — David Hofmann, Apr 07 '14 at 21:34
@DavidHofmann you'd really have to check the source of the library and ensure the get and set operations are atomic to use them as such. — Dave S, Apr 07 '14 at 21:36
@DaveS: You're right in that the atomicity should be documented. But only because of the type being `double` rather than e.g., `float` (for which the guarantee is given by the JVM). The whole class `AtomicDoubleArray` delegates everything to `j.u.c.AtomicLongArray` (what else should it do). — maaartinus, Apr 07 '14 at 21:37
Being Google and asuming they know how to write APIs I just assumed set meant atomic, in a class called Atomic*... :) — David Hofmann, Apr 07 '14 at 21:38
@LouisWasserman: You were faster by 1 minute. `AtomicLongArray` doesn't document it either. — maaartinus, Apr 07 '14 at 21:39
Actually, it **is** documented. The *class documentation* points to the [`package java.util.concurrent.atomic`](http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html), which states that "set has the memory effects of writing (assigning) a volatile variable.", which is explained somewhere in JLS, see e.g., [this answer of mine](http://stackoverflow.com/a/4756578/581205). Quite a long way... — maaartinus, Apr 07 '14 at 21:49
The Java Memory Model doesn't reorder anything. It *prohibits* reordering of certain things under certain conditions. You're asking the wrong question, you should be asking whether an *implementation* is *permitted* to reorder this. — user207421, Apr 07 '14 at 23:04
Thanks @EJP. I edited the question to correctly reflect the problem. — David Hofmann, Apr 08 '14 at 01:22

score 3 · Accepted Answer · answered Apr 07 '14 at 21:33

3

The most important rule of the Java Memory Model is (§17.4.5):

A program is correctly synchronized if and only if all sequentially consistent executions are free of data races. If a program is correctly synchronized, then all executions of the program will appear to be sequentially consistent (§17.4.3).

If all shared variables are volatile or atomic, then there are no data races. That means sequential consistency is guaranteed. And that means, there will be no reorderings visible in the behaviour of the program.

answered Apr 07 '14 at 21:33

nosid

48,932
13
112
139

Super great answer @nosid, I just read the whole memory model part of the jls, but I am a bit overwelment and confused. This clarification really helps. So after reading the code in the question a bit. Do you think it performs well in concurrent scenarios ? – David Hofmann Apr 07 '14 at 21:36
@DavidHofmann: Sorry, I can't tell you that. Honestly, I would never write such code as long as I am not absolutely sure it is required to be fast enough. I recommend doing a benchmark. – nosid Apr 07 '14 at 21:57
Neighter would I write that. But I want to test what is the cost of a mutex vs 2 CAS ops in a real program that I am writing. That is why I want both implementations, a lock-based and a lock-free one, But I don't want to do any benchmark before being sure that both are semantically the same. – David Hofmann Apr 07 '14 at 21:59
1

@DavidHofmann: I'm not sure what you're trying to do, but things are rather complicated. In case you're lucky, CAS works like charm, if the variables are contented then the busy waiting may produce a lot of coherency traffic. And if you run out of cores, then busy waiting is a pure disaster. – maaartinus Apr 08 '14 at 12:42
@maaartinus More often than not I am guessing that the spin locks won't kick in. And the code is just an example, the real thing does much less of spin locks. I won't have contention because it will only be as much threads as there are cores. But can you clarify what is coherency traffic? – David Hofmann Apr 08 '14 at 12:47
1

@DavidHofmann: I can't find the link anymore and I'm not sure about the details, but checking a heavily contended variable in a tight loop leads to a lot of communication between the cores. Somewhere a read about [backoff](http://en.wikipedia.org/wiki/Exponential_backoff) used to solve this and I guess that's why the [MWAIT instruction](http://software.intel.com/en-us/articles/how-to-use-the-monitor-and-mwait-streaming-simd-extensions-3-instructions) exists. But I don't know much about the subject. – maaartinus Apr 08 '14 at 13:23
Thanks I will have that in mind, in the comming days I will probably come to this question again with some metrics. Don't worry however in the for loop, it will at most be checking only 3 places and not from the begining of the array as in the example code – David Hofmann Apr 08 '14 at 13:30

Will the Java Memory Model permit the reordering of non-synchronized accesses of many atomic/volatile variables?

1 Answers1