Can instructions in Java Concurrency in Practice 5.18 be reordered during compiler optimizations

Question

I'm reading the book on subject.

In 5.18, Brian Goetz gave an example of semi-efficient memoizer with a non-volatile shared variable cache having the type of ConcurrentHashMap as follows:

public class Memoizer3<A, V> implements Computable<A, V> {
    private final Map<A, Future<V>> cache
        = new ConcurrentHashMap<A, Future<V>>();
    private final Computable<A, V> c;

    public Memoizer3(Computable<A, V> c) { this.c = c; }

    public V compute(final A arg) throws InterruptedException {
        Future<V> f = cache.get(arg);
        if (f == null) {
            Callable<V> eval = new Callable<V>() {
                public V call() throws InterruptedException {
                    return c.compute(arg);
                }
            };
            FutureTask<V> ft = new FutureTask<V>(eval);
            f = ft;
            cache.put(arg, ft); // Can it be put at the very beginning of compute?
            ft.run();
        }
        try {
            return f.get();
        } catch (ExecutionException e) {
            throw launderThrowable(e.getCause());
        }
    }
}

The problem is that I don't understand rules under which cache.put(arg, ft); can be reordered by a compiler to be put ahead of Future<V> f = cache.get(arg); in terms of JLS (Is reordering of cache variable is possible at all?).

Under "reordering", I mean a chance of that lines of complete code may be reordered by a compiler due to enabled optimizations.

The question does not touch the topic of CPU memory reordering, which is highlighted, e.g., in https://stackoverflow.com/a/66973124

EDIT:

A reason for this question is the ability of a compiler to damage unsynchronized multithreaded code fragments using shared variables in some cases, another reason is a quotation from an author of this book, Doug Lea:

The within-thread as-if-serial property is helpful only when only one thread at a time is manipulating variables, due to synchronization, structural exclusion, or pure chance. When multiple threads are all running unsynchronized code that reads and writes common fields, then arbitrary interleavings, atomicity failures, race conditions, and visibility failures may result in execution patterns that make the notion of as-if-serial just about meaningless with respect to any given thread.

Even though JLS addresses some particular legal and illegal reorderings that can occur, interactions with these other issues reduce practical guarantees to saying that the results may reflect just about any possible interleaving of just about any possible reordering. So there is no point in trying to reason about the ordering properties of such code.

Per http://gee.cs.oswego.edu/dl/cpj/jmm.html

In other words, not following the JLS constraints regarding "happens-before", locks or volatile semantics may lead to broken results in unsynchronized code which uses shared variables.

P.S. Thanks to Peter Cordes for his comments on this theme.

You'd previously tagged this with [instruction-reordering]. That's not a very helpful way to think about memory reordering. CPUs (and compilers) preserve the illusion (for this thread) of things happening in program order, it's only the order of memory operations (not instructions) seen by other threads that can vary. See [Java instruction reordering and CPU memory reordering](https://stackoverflow.com/q/69568946) — Peter Cordes, Oct 14 '21 at 14:03
@PeterCordes Let me reformulate my question: Can't `cache.put(arg, ft);` be put at the very beginning of the method just because it uses local var `ft`? In other words, is `ft` is the only reason preventing It? — bimjhi, Oct 14 '21 at 14:13
@pveentjer Please be more specific if the use of a local in cache.put prevents reordering or not. — bimjhi, Oct 14 '21 at 14:21
Are you literally asking if you could edit the source code to make that change, and have the code still work correctly? Or asking if anything stops the JVM from making asm *as if* you'd done that, *other* than a data dependency? — Peter Cordes, Oct 14 '21 at 14:22
The get and the put produce a series of instructions. And the compiler/CPU could reorder some of them so that you end up with an interleaving of instructions from get and some instructions from put. But it can only be done under the guarantee that the behavior of this reordering is no different then if the get is executed first and then the put. So the sequential semantics of the program need to be preserved. — pveentjer, Oct 14 '21 at 14:27
I added an addition to my answer that hopefully improves your understanding. — pveentjer, Oct 14 '21 at 14:35
There is no benefit in discussing this in terms of source code fragments being shuffled around. For example, if you assume that `cache.put(arg, ft);` could be placed at the beginning of the method, what happens to the `FutureTask ft = new FutureTask(eval);` whose result is used by `cache.put(arg, ft);`? Describe the *actual outcome* of this method you’re speculating about. — Holger, Oct 15 '21 at 11:05

pveentjer · Answer 1 · 2021-10-14T15:02:28.693

3

Instructions can't be reordered if they violate the sequential semantics of a program.

Simple example (assuming a=b=0):

a=1
b=a

So according to the sequential semantics of the above program, the only allowed outcome is a=1, b=1. If the 2 instructions would be reordered, then we get outcome a=1,b=0. But this outcome is violating the sequential semantics and hence prohibited

This is also informally called within thread as if serial semantics. So the compiler (or CPU) is allowed to reordered instructions. But the most basic constraint is that no reorderings are allowed that would violate the sequential semantics.

If the JVM would be allowed to violate the sequential semantics of a program, I'm going to quit my job as developer today :)

In terms of the JMM: the a=1 is ordered before the b=a in the happens before order due to the program order between these 2 instructions.

Keep in mind that the JMM is not specified in terms of method calls. It is expressed in actions like plain loads/stores volatile loads/stores, monitor lock release/acquire etc.

[Addition]

Imagine you have the following code:

int a,b,c,d=0;

void foo(){
   a=1
   b=1
}

void bar(){
  c=1
  d=a
}

void foobar(){
   foo();  
   bar();
}

Then the only allowed result is 'a=1,b=1,c=1,d=1'

Due to inlining we can get rid of the function calls:

void foobar(){
  a=1 //foo
  b=1 //foo
  c=1 //bar
  d=a //bar
}

The following execution preserves the sequential semantics:

  c=1 //bar
  a=1 //foo
  b=1 //foo
  d=a //bar

Since the outcome is 'a=1,b=1,c=1,d=1'

But the following execution violates the sequential semantics.

   d=a //bar
   a=1 //foo
   b=1 //foo
   c=1 //bar

Because we end up with 'a=1,b=1,c=1,d=0', where d is 0 instead of 1.

Instructions from function calls can be reordered under the condition that the sequential semantics of the program is not violated.

edited Oct 14 '21 at 15:02

answered Oct 14 '21 at 13:49

pveentjer

10,545
3
23
40

1

If it doesn't violate the sequential semantics, then yes. But in your cause it violates the sequential semantics and hence it isn't allowed. – pveentjer Oct 14 '21 at 15:05
You need to see it like this: If there is just a single thread of execution, the CPU/compiler can reorder as much as it wants as long as it isn't caught (so doesn't violate the illusion of a sequential execution). So you can just assume everything is executed in order; even though the reality is very different. The big problem is when there are multiple threads of execution and are accessing shared memory. – pveentjer Oct 14 '21 at 15:09
1

I don't see any problem with his code. So it should be running fine as long as the JVM and CPU respect the sequential semantics of the program. – pveentjer Oct 14 '21 at 15:12
The problem is that I can't find concrete rules which would allow me safely modify Goetz' code without sync blocks. To give an example, put `cache.put(arg, ft);` into inner `if`. – bimjhi Oct 14 '21 at 15:16
I'm not sure what you mean with an inner if since there is only 1 and the put is already in that if. – pveentjer Oct 14 '21 at 15:19
I mean a situation where I'd need to put FutureTask into collection under specific condition (e.g. collection size <=100) – bimjhi Oct 14 '21 at 15:23
If it would lead to placing the item in the collection before the collection.size check is done, then that is a violation of the sequential semantics of the program. – pveentjer Oct 14 '21 at 15:25

bimjhi · Accepted Answer · 2022-06-27T13:28:28.807

After some investigation on ConcurrentHashMap.get, ConcurrentHashMap.put, I can say the understanding of why B. Goetz' code has such a structure requires knowing the internals of ConcurrentHashmap.

Under "reordering" below, I mean a chance of that lines of a complete code may be reordered by a compiler due to enabled optimizations.

The answer does not touch the topic of CPU memory reordering, which is highlighted, e.g., in https://stackoverflow.com/a/66973124

In his previous example using ordinal Map version, B. Goetz used a synchronized version of compute:

public class Memoizer1<A, V> implements Computable<A, V> {
    @GuardedBy("this")
    private final Map<A, V> cache = new HashMap<A, V>();
    private final Computable<A, V> c;

    public Memoizer1(Computable<A, V> c) { this.c = c; }

    public synchronized V compute(final A arg) throws InterruptedException {
        V result = cache.get(arg);
        if (result == null) {
            result = c.compute(arg);
            cache.put(arg, result);
        }
        return result;
    }
}

Synchronized here was needed to prevent simultaneous access of multiple threads to the hashmap. Since access methods of an ordinal hashmap are not atomic, there might potentially be a situation where one thread rewrites incomplete portions of data made by another thread, thus blocking the latter from completing its work. For the same reason, a reading method could potentially see partially constructed data.

As long as synchronized forces it to actually run a store instruction, the value will become visible to other threads just by committing it to the local L1d cache, because it's coherent. (And a memory barrier will block later loads/stores until that happens).

Later, B. Goetz replaced ordinal HashMap with ConcurrentHashMap, which allowed him to remove synchronized keyword.

https://www.burnison.ca/articles/the-concurrency-of-concurrenthashmap clearly explains why ConcurrentHashMap.get is the first here:

In comparisons to the previous methods, the get() and containsKey() methods are fairly mundane. Also unlike the previous methods, both are entirely lock-free. First, a Segment is retrieved from the segments array using the applicable high-order bits of the key's hash. The retrieval is performed using Unsafe.getObjectVolatile(). Next, a HashEntry of the segment's table array is retrieved using the key's hash. This retrieval is also performed using Unsafe.getObjectVolatile. From this head node, the linked list of HashEntry objects is traversed until the specified key is found (or not found) and the applicable value is returned.

Due to its volatile-read semantics, ConcurrentHashMap.get cannot be moved down in code by a compiler. At the same time, it allows moving that up.

However, Volatile reads may be reordered with previous lines so those lines have to be plain and their influence on the code below should be understandable without deep mental analysis.

ConcurrentHashMap.put has volatile-write semantics so it cannot be reordered with upper operations but can be reordered with operations below. But FutureTask.run (ft.run(); here) internally uses full-fence CompareAndSetState (see How FutureTask is asynchronous computation and compareAndSwap a common member ( non-volatile member ) still has memory semantics of volatile read and write) so it generates full memory barrier and thus cache.put(arg, ft); cannot be reordered with ft.run();.

The quotation of another author of "Java Concurrency in Practice" below is related to that within thread as if serial semantics is not enough to understand the multithreaded code fragment using shared variables.

The within-thread as-if-serial property is helpful only when only one thread at a time is manipulating variables, due to synchronization, structural exclusion, or pure chance. When multiple threads are all running unsynchronized code that reads and writes common fields, then arbitrary interleavings, atomicity failures, race conditions, and visibility failures may result in execution patterns that make the notion of as-if-serial just about meaningless with respect to any given thread.

Even though JLS addresses some particular legal and illegal reorderings that can occur, interactions with these other issues reduce practical guarantees to saying that the results may reflect just about any possible interleaving of just about any possible reordering. So there is no point in trying to reason about the ordering properties of such code.

(C) Doug Lea

Per http://gee.cs.oswego.edu/dl/cpj/jmm.html

*chance of that lines of a complete code may be reordered by a compiler due to enabled optimizations.* Note that memory reordering can happen without compile-time instruction reordering. (Especially on non-x86 ISAs like ARM). It's not super useful to focus on compile-time instruction reordering if what you actually care about is controlling the order of memory operations as seen by other threads. [Java instruction reordering and CPU memory reordering](https://stackoverflow.com/q/69568946) — Peter Cordes, Oct 15 '21 at 21:15
I've read your answer, it looks useful for a start, thanks. But I need a concrete example demonstrating problems in ```Java```. Maybe you saw an article somewhere? Plus, do you mean that `volatile` in Java won't prevent reordering in memory? I failed to find this info. — bimjhi, Oct 15 '21 at 21:18
Indeed, `volatile` is like C++ `std::atomic` with memory_order_seq_cst, but memory reordering is still allowed with accesses to non-volatile fields, in one direction. [Is this understanding correct for these code about java volatile and reordering?](https://stackoverflow.com/q/69571562) shows an example. Full sequential consistency is only guaranteed for data-race-free programs, which only read/write any non-`volatile` fields after synchronizing with another `volatile` access. — Peter Cordes, Oct 15 '21 at 21:35
Yes, volatite in Java generates a half fence prohibiting a compiler to move instructions in one direction, a monitor lock generates a full fence with bi-directional prohibition. Yet I cannot understand if you find my answer useful or not. — bimjhi, Oct 15 '21 at 21:39
*prohibiting **a compiler** to move instructions in one direction* - my point is that runtime reordering is also possible on some ISAs like AArch64, even if a compiler/JIT makes asm whose order matches source order. There are lots of cases where statically reordering (the asm instruction order) isn't allowed for correctness, but runtime reordering can still happen (e.g. [What are the exact inter-thread reordering constraints on mutex.lock() and .unlock() in c++11 and up?](https://stackoverflow.com/a/66973124)). — Peter Cordes, Oct 15 '21 at 21:53
Your answer is also promoting another misconception. CPU cache *is* coherent. [Myths Programmers Believe about CPU Caches](https://software.rajivprab.com/2018/04/29/myths-programmers-believe-about-cpu-caches/). The thing you need to prevent is having the JIT compiler keep a value in *registers*. As long as `volatile` or `synchronized` forces it to actually run a store instruction, the value will become visible to other threads just by committing it to the local L1d cache, because it's coherent. (And a memory barrier will block later loads/stores until that happens). — Peter Cordes, Oct 15 '21 at 21:56
e.g. `while(1) { foo.x = 1; }` might not do the store at all for non-`volatile` `x`, because the (JIT) compiler would sink the store out of the loop. But the loop is infinite so it's never reached. i.e. eliminate the dead stores because only one is needed, but the loop does infinite. e.g. like in C: https://godbolt.org/z/1a745P5Wa — Peter Cordes, Oct 15 '21 at 22:00
Well, I added a remark that the question and my answer do not cover the subject of memory reordering and corrected the thesis of CPU cache. — bimjhi, Oct 15 '21 at 22:27
Reading more of your answer, no, *Synchronized here was needed to prevent reordering of calls to a HashMap variable as well as thread-interleaving.* - incorrect. It's needed for **mutual exclusion**, not just ordering, because `cache.put` wouldn't be safe if multiple threads were running it, probably even if all their operations happened in some interleaving of program order. `cache.put` isn't a single atomic operation the CPU can do, it's a function call that does multiple loads/stores. (And `cache.get` is similarly probably not safe if another thread might be running `cache.put`.) — Peter Cordes, Oct 16 '21 at 00:20
Also, `ConcurrentHashMap.get` isn't an "instruction", it's a method call that (even if inlined) will need multiple instructions. See https://preshing.com/20130605/the-worlds-simplest-lock-free-hash-table/ for an example implementation of a lock-free hash table. (What makes it safe to use from multiple writers in parallel is *not* an internal lock, that wouldn't be lock-free. Instead it's careful design using atomic RMW operations where necessary to make sure exactly one thread can succeed at claiming a slot, and careful design so that readers can't see half-written data.) — Peter Cordes, Oct 16 '21 at 00:26
I put a clarification that synchronized prevents simultaneous access of multiple threads to shared hashmap. Here is a discussion on how Unsafe.getIntVolatile() helps to prevent reordering: https://groups.google.com/g/mechanical-sympathy/c/Z9YO945dwlM — bimjhi, Oct 16 '21 at 05:52

Can instructions in Java Concurrency in Practice 5.18 be reordered during compiler optimizations

2 Answers2

Linked