15

Java memory visibility documentation says that:

A write to a volatile field happens-before every subsequent read of that same field.

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores. So for example I assign value to variable in cycle c1 in some thread and then second thread is able to see this value in subsequent cycle c1 + 1?

Gray
  • 115,027
  • 24
  • 293
  • 354
Trismegistos
  • 3,821
  • 2
  • 24
  • 41
  • @Ben there is no such thing as clearing the caches. Caches are coherent and write-back on x86. My question was what is meaning of subsequent. Subsequent implies some order and the question is what defines order. Java specification is not clear about it. Also I'm interested how this order maps to contemporary hardware. – Trismegistos Jun 15 '18 at 11:45
  • But I rather deleted the comment as a lot of answers are here already and it's not helping anyone anymore :) – Ben Jun 15 '18 at 11:52

5 Answers5

9

It sounds to me like it's saying that it provides lockless acquire/release memory-ordering semantics between threads. See Jeff Preshing's article explaining the concept (mostly for C++, but the main point of the article is language neutral, about the concept of lock-free acquire/release synchronization.)

In fact Java volatile provides sequential consistency, not just acq/rel. There's no actual locking, though. See Jeff Preshing's article for an explanation of why the naming matches what you'd do with a lock.)


If a reader sees the value you wrote, then it knows that everything in the producer thread before that write has also already happened.

This ordering guarantee is only useful in combination with other guarantees about ordering within a single thread.

e.g.

int data[100];
volatile bool data_ready = false;

Producer:

data[0..99] = stuff;
 // release store keeps previous ops above this line
data_ready = true;

Consumer:

while(!data_ready){}     // spin until we see the write
// acquire-load keeps later ops below this line
int tmp = data[99];      // gets the value from the producer

If data_ready was not volatile, reading it wouldn't establish a happens-before relationship between two threads.

You don't have to have a spinloop, you could be reading a sequence number, or an array index from a volatile int, and then reading data[i].


I don't know Java well. I think volatile actually gives you sequential-consistency, not just release/acquire. A sequential-release store isn't allowed to reorder with later loads, so on typical hardware it needs an expensive memory barrier to make sure the local core's store buffer is flushed before any later loads are allowed to execute.

Volatile Vs Atomic explains more about the ordering volatile gives you.

Java volatile is just an ordering keyword; it's not equivalent to C11 _Atomic or C++11 std::atomic<T> which also give you atomic RMW operations. In Java, volatile_var++ is not an atomic increment, it a separate load and store, like volatile_var = volatile_var + 1. In Java, you need a class like AtomicInteger to get an atomic RMW.

And note that C/C++ volatile doesn't imply atomicity or ordering at all; it only tells the compiler to assume that the value can be modified asynchronously. This is only a small part of what you need to write lockless for anything except the simplest cases.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 5
    This is the right answer - _subsequent_ in this context means "a read that sees the new value written", so as not to invoke a global clock or anything like that. In theory, this lets a write be indefinitely delayed since there is no guarantee that any read will see the value soon or ever - but in practice on all interesting architectures the write generally becomes visible "soon" (often single-digit nanoseconds, typical worst case probably hundreds of nanoseconds). – BeeOnRope Jun 15 '18 at 14:51
  • I have a question actually. Isn't that goes down to the fact the intel's implementation of cache coherence protocol serializes the cache-line transition state? So we can rely on "what happens before which one"? No? – St.Antario Jun 16 '18 at 16:14
  • @St.Antario: Not sure what you're saying. Having a single total order of stores that all threads agree on, *for a single cache line or memory location* doesn't imply release/acquire semantics. That's a property of memory ordering between stores to *different* cache lines, and has to be enforced separately. (And BTW, some weakly-ordered systems [don't have a global store order that all threads can agree on](https://stackoverflow.com/a/50679223) for relaxed stores at all. But x86's TSO memory model requires one, and it has to be some interleaving of program order.) – Peter Cordes Jun 16 '18 at 17:24
  • @PeterCordes Let me try to be more specific. If we consider two threads performing reading and writing to a volatile variables concurrently [like this](https://pastebin.com/YaF41iQB) on jdk8-hotspot we will have [the following compiled code at runtime](https://pastebin.com/3LedFhuk). When writing to the volatile variable we have `lock addl`. How does this instruction invoked by some core affect another cores? How does it guarantees the happens-before ordering. – St.Antario Jun 16 '18 at 18:09
  • 1
    @St.Antario: A core can't begin executing `lock add` until it has the line in E state of MESI, and it keeps the line locked for the duration so no other cores can modify it. My canonical answer on [Can num++ be atomic for 'int num'?](https://stackoverflow.com/q/39393850) has the details. In fact no other cores can even *read* it, and it's a full memory barrier on the local core, which is what gives sequential-consistency and the happens-before. – Peter Cordes Jun 16 '18 at 18:12
  • @PeterCordes If we remove the `lock` prefix from the `add` instruction and put `LFENCE` before `vmovq %xmm0,%rdi ;*getstatic a` do we get the same memory ordering? – St.Antario Jun 16 '18 at 18:26
  • 1
    @St.Antario: oh, I didn't look at the asm you linked before, I assumed you were talking about an atomic increment. You're doing an atomic load, then an increment of a temporary, then an atomic store with seq-cst ordering. (With some really braindead asm, like `movabs $1, %r10` instead of `sub $1, %rdi`, and bouncing through XMM regs for no reason). `lock addl $0, (%rsp)` is being used as a memory barrier because it's probably more efficient than `mfence`. – Peter Cordes Jun 16 '18 at 18:34
  • @St.Antario: Anyway, the entire `inc()` function is not a single atomic operation, probably because you use `a = a+1` which reads the variable in an expression and then assigns it. Does `a+=1;` work in Java, or does it have special RMW functions for `volatile` variables, like C++'s [`std::atomic_fetch_add(&var, 1);`](http://en.cppreference.com/w/cpp/atomic/atomic_fetch_add). (Normally you'd use member functions and overloaded operators in C++, but there are stand-alone functions). I assume this is not what you wanted. – Peter Cordes Jun 16 '18 at 18:41
  • @PeterCordes Yes, simple increment like this is not atomic. Actually I was not concerned about atomicity in this case (we have `AtomicLong::compareAndSet` and friends methods which CASes the value). I was concerned about memory ordering and how it was affected by the `lock addl` (probably I provided sort of crazy example). – St.Antario Jun 16 '18 at 18:44
  • 1
    @St.Antario: The load is already an acquire-load and doesn't need further fencing. The store is a release-store on its own (can't reorder with earlier ops, or later stores because they're also release-stores), but the full barrier makes it a sequential-release (can't reorder with later loads.) `lfence` before doesn't give you that. See http://preshing.com/20120515/memory-reordering-caught-in-the-act/. (mfence is equivalent to `lock addl $0, (%rsp)`). Semi Related: [Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?](https://stackoverflow.com/q/27627969). – Peter Cordes Jun 16 '18 at 18:56
  • Sorry, misclick on the downvote. Just edited your post to be able to remove. – Gray Jun 18 '18 at 12:40
  • This is pretty much correct although sticking to Java would be good as to not confuse. The only language I don't like is the acquire/release which implies locking which obviously doesn't happen with `volatile`. – Gray Jun 18 '18 at 13:55
  • 2
    @Gray: Thanks, I hadn't thought of that possible confusion for beginners. Acquire/Release is the standard terminology for this kind of memory-ordering semantics, but I added some words to make it clear that it's lockless memory-ordering, *not* acquiring a lock. (Jeff Preshing's article already fully explained that, but it's not a bad thing to have it right here in the answer.) – Peter Cordes Jun 18 '18 at 14:13
3

It means that once a certain Thread writes to a volatile field, all other Thread(s) will observe (on the next read) that written value; but this does not protect you against races though.

Threads have their caches, and those caches will be invalidated and updated with that newly written value via cache coherency protocol.

EDIT

Subsequent means whenever that happens after the write itself. Since you don't know the exact cycle/timing when that will happen, you usually say when some other thread observes the write, it will observer all the actions done before that write; thus a volatile establishes the happens-before guarantees.

Sort of like in an example:

 // Actions done in Thread A
 int a = 2;
 volatile int b = 3;


 // Actions done in Thread B
 if(b == 3) { // observer the volatile write
    // Thread B is guaranteed to see a = 2 here
 }

You could also loop (spin wait) until you see 3 for example.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 3
    "on the next read" - he seems to be confused as to what constitutes a "next" read and how that is determined – Michael Jun 15 '18 at 10:46
  • @Michael exactly. Eugene just changed subsequent to next. That doesn't explain what next/subsequent means for hardware or java. – Trismegistos Jun 15 '18 at 10:53
  • subsequent/next is anything after the cycle that wrote the value to memory basically. No guarantee is given when the cycle writing the value happens. A guarantee is given that after that the value is "set in stone" and will be read as that value, no matter any caching, etc. – Ben Jun 15 '18 at 10:56
  • 1
    @Ben exactly. that is the reason why usually people say - *when Thread B observes* the value written it will observer everything before that write – Eugene Jun 15 '18 at 11:36
  • 3
    @Michael well, to me, that is the entire point with volatiles - since you don't know the exact cycle when that will happen, you usually say when some *other thread observes* the write, it will observer all the actions done before that write – Eugene Jun 15 '18 at 11:43
  • @Eugene your latest comment seems to expain it the best. – Trismegistos Jun 15 '18 at 12:00
  • @Trismegistos even the upvoted example here https://stackoverflow.com/a/50875012/1059372 does a spin wait, so that another Thread *observes* that write. That is the point with volatiles to begin with - they establish the `happens-before`... – Eugene Jun 15 '18 at 12:12
  • @Eugene you comments are far better than your initial answer. Can you improve your answer? – Trismegistos Jun 15 '18 at 12:15
  • @Trismegistos well, if you like this, take a look at `AtomicInteger#lazySet` and the single writer principle - it's a very nice feature too – Eugene Jun 15 '18 at 12:28
  • @Ben Can you expand a bit why does not caching matter? What if the core writing some value have the cache line in invalidated state and some another core owns the cache line exclusively? So before the writing cycle occurs the cache line in another core has to be invalidated first. – St.Antario Jun 16 '18 at 16:34
  • 1
    @Gray I think I've edited the example a long time ago, but forgot to mention you in a comment. – Eugene Aug 13 '18 at 09:26
2

Peter's answer gives the rationale behind the design of the Java memory model.
In this answer I'm attempting to give an explanation using only the concepts defined in the JLS.


In Java every thread is composed by a set of actions.
Some of these actions have the potential to be observable by other threads (e.g. writing a shared variable), these are called synchronization actions.

The order in which the actions of a thread are written in the source code is called the program order.
An order defines what is before and what is after (or better, not before).

Within a thread, each action has a happens-before relationship (denoted by <) with the next (in program order) action. This relationship is important, yet hard to understand, because it's very fundamental: it guarantees that if A < B then the "effects" of A are visible to B.
This is indeed what we expect when writing the code of a function.

Consider

Thread 1           Thread 2

  A0                 A'0
  A1                 A'1
  A2                 A'2
  A3                 A'3

Then by the program order we know A0 < A1 < A2 < A3 and that A'0 < A'1 < A'2 < A'3.
We don't know how to order all the actions.
It could be A0 < A'0 < A'1 < A'2 < A1 < A2 < A3 < A'3 or the sequence with the primes swapped.
However, every such sequence must have that the single actions of each thread are ordered according to the thread's program order.

The two program orders are not sufficient to order every action, they are partial orders, in opposition of the total order we are looking for.

The total order that put the actions in a row according to a measurable time (like a clock) they happened is called the execution order.
It is the order in which the actions actually happened (it is only requested that the actions appear to be happened in this order, but that's just an optimization detail).

Up until now, the actions are not ordered inter-thread (between two different threads).
The synchronization actions serve this purpose.
Each synchronization action synchronizes-with at least another synchronization action (they usually comes in pairs, like a write and a read of a volatile variable, a lock and the unlock of a mutex).

The synchronize-with relationship is the happens-before between thread (the former implies the latter), it is exposed as a different concept because 1) it slightly is 2) happens-before are enforced naturally by the hardware while synchronize-with may require software intervention.

happens-before is derived from the program order, synchronize-with from the synchronization order (denoted by <<).
The synchronization order is defined in terms of two properties: 1) it is a total order 2) it is consistent with each thread's program order.

Let's add some synchronization action to our threads:

Thread 1           Thread 2

  A0                 A'0
  S1                 A'1
  A1                 S'1
  A2                 S'2
  S2                 A'3

The program orders are trivial.
What is the synchronization order?

We are looking for something that by 1) includes all of S1, S2, S'1 and S'2 and by 2) must have S1 < S2 and S'1 < S'2.

Possible outcomes:

S1 < S2 < S'1 < S'2
S1 < S'1 < S'2 < S2
S'1 < S1 < S'2 < S'2

All are synchronization orders, there is not one synchronization order but many, the question of above is wrong, it should be "What are the synchronization orders?".

If S1 and S'1 are so that S1 << S'1 than we are restricting the possible outcomes to the ones where S1 < S'2 so the outcome S'1 < S1 < S'2 < S'2 of above is now forbidden.

If S2 << S'1 then the only possible outcome is S1 < S2 < S'1 < S'2, when there is only a single outcome I believe we have sequential consistency (the converse is not true).

Note that if A << B these doesn't mean that there is a mechanism in the code to force an execution order where A < B.
Synchronization actions are affected by the synchronization order they do not impose any materialization of it.
Some synchronization actions (e.g. locks) impose a particular execution order (and thereby a synchronization order) but some don't (e.g. reads/writes of volatiles).
It is the execution order that create the synchronization order, this is completely orthogonal to the synchronize-with relationship.


Long story short, the "subsequent" adjective refers to any synchronization order, that is any valid (according to each thread program order) order that encompasses all the synchronization actions.


The JLS then continues defining when a data race happens (when two conflicting accesses are not ordered by happens-before) and what it means to be happens-before consistent.
Those are out of scope.

Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Release/acquire is not specific to x86, and neither is my answer. It's probably one of the least x86-centric answer I've written in a long time. :P But it is answering by analogy to C++ so your answer is definitely useful. – Peter Cordes Jun 15 '18 at 20:51
  • 1
    @PeterCordes Oh, sorry, bad wording :) I'm fixing it. – Margaret Bloom Jun 16 '18 at 06:29
  • I find this answer really difficult to follow and also misleading. You can talk about A < B < C but the compiler can reordering those statements at will for optimizations as long as it doesn't violate the language definition. That's the whole point of this. Program order may be A B C D but execution order could easily be D C B A or any other combination unless (for example) C depends on A and B where the reordering would violate the language definition. – Gray Jun 17 '18 at 19:05
  • @Gray, Here's the key to understand: This is about the language definition. This is not about what the compiler will do, A < B is a construct in the JLS, something a compiler must be compliant to. Reordering is irrelevant and it's not the answer to the OP to me. – Margaret Bloom Jun 17 '18 at 21:27
  • The OP is talking about sharing data between threads. That is more about execution order than program order. Your statement "every such sequence must have that the single actions of each thread are ordered according to the thread's program order" is incorrect because once you consider multiple threads, the reordering is critical. It is completely legal for A1 to be reordered so it comes before A0. – Gray Jun 17 '18 at 23:22
  • @Gray I know that reordering is legal but is the ex order that is subject to the prog order of each thread (see JLS 17.4.7, which contradicts your last comment). Whatever the reorder is, it **must** be *equivalent* to the program order, so the former is just an implementation detail and we can reason only in terms of program order (and sync order for MT). All these concepts are necessary to set the bounds of a valid execution. Finally, memory reordering is just one thing, [visibility being the other](https://hadibrais.wordpress.com/2018/05/14/the-significance-of-the-x86-lfence-instruction/). – Margaret Bloom Jun 18 '18 at 15:38
  • @Gray Anyway, I think it's pointless to argue :) I surely don't know all the nuissances of the JLS but I still think my interpretation is largerly correct, maybe too technical for the OP. – Margaret Bloom Jun 18 '18 at 15:40
  • I don't think it is pointless :-). See 17.4-1 on that page for an example of reordering. 17.4.7 is saying that reordering cannot change the actions of the code but that doesn't mean there is guaranteed order at execution time. The compiler is able to do all sorts of tricks to get more speed out of the code as long as the overall effect of the code is the same. For example, a constructor can finish and return an allocated object _before_ the field initialization is done which is why unsafe publishing of objects is such a problem. The examples are wild. – Gray Jun 18 '18 at 15:55
2

I'm confused what does subsequent means in context of multithreading. Does this sentence implies some global clock for all processors and cores...?

Subsequent means (according to the dictionary) coming after in time. There certainly is a global clock across all CPUs in a computer (think X Ghz) and the document is trying to say that if thread-1 did something at clock tick 1 then thread-2 does something on another CPU at clock tick 2, it's actions are considered subsequent.

A write to a volatile field happens-before every subsequent read of that same field.

The key phrase that could be added to this sentence to make it more clear is "in another thread". It might make more sense to understand it as:

A write to a volatile field happens-before every subsequent read of that same field in another thread.

What this is saying that if a read of a volatile field happens in Thread-2 after (in time) the write in Thread-1, then Thread-2 will be guaranteed to see the updated value. Further up in the documentation you point to is the section (emphasis mine):

... The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular.

Notice the highlighted phrase. The Java compiler is free to reorder instructions in any one thread's execution for optimization purposes as long as the reordering doesn't violate the definition of the language – this is called execution order and is critically different than program order.

Let's look at the following example with variables a and b that are non-volatile ints initialized to 0 with no synchronized clauses. What is shown is program order and the time in which the threads are encountering the lines of code.

Time     Thread-1        Thread-2
1        a = 1;          
2        b = 2;          
3                        x = a;
4                        y = b;
5        c = a + b;      z = x + y;

If Thread-1 adds a + b at Time 5, it is guaranteed to be 3. However, if Thread-2 adds x + y at Time 5, it might get 0, 1, 2, or 3 depends on race conditions. Why? Because the compiler might have reordered the instructions in Thread-1 to set a after b because of efficiency reasons. Also, Thread-1 may not have appropriately published the values of a and b so that Thread-2 might get out of date values. Even if Thread-1 gets context-switched out or crosses a write memory barrier and a and b are published, Thread-2 needs to cross a read barrier to update any cached values of a and b.

If a and b were marked as volatile then the write to a must happen-before (in terms of visibility guarantees) the subsequent read of a on line 3 and the write to b must happen-before the subsequent read of b on line 4. Both threads would get 3.

We use volatile and synchronized keywords in java to ensure happens-before guarantees. A write memory barrier is crossed when assigning a volatile or exiting a synchronized block and a read barrier is crossed when reading a volatile or entering a synchronized block. The Java compiler cannot reorder write instructions past these memory barriers so the order of updates is assured. These keywords control instruction reordering and insure proper memory synchronization.

NOTE: volatile is unnecessary in a single-threaded application because program order assures the reads and writes will be consistent. A single-threaded application might see any value of (non-volatile) a and b at times 3 and 4 but it always sees 3 at Time 5 because of language guarantees. So although use of volatile changes the reordering behavior in a single-threaded application, it is only required when you share data between threads.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • 1
    "in another thread" is the interesting part, but it's also true within a single thread. Agreed that including that phrase would make the meaning clearer. – Peter Cordes Jun 15 '18 at 21:01
  • Program order assures the single thread that a + b always == 3. To not do so would violate the language rules. But I'll flesh that out a bit @PeterCordes. Thanks. – Gray Jun 15 '18 at 21:02
  • 1
    Yes, of course it's a trivial / obvious guarantee within a single thread, and should go without saying. I just meant that calling it "missing" implies the sentence isn't accurate without it. Just a phrasing issue. – Peter Cordes Jun 15 '18 at 21:04
  • Ok, but what do "subsequent" mean? Note that for the x86 architecture subsequent according to a total time is not sufficient for the volatile semantics. This doesn't answer the question. – Margaret Bloom Jun 17 '18 at 21:30
  • 1
    Subsequent means happening after in an `execution` order standpoint. If `volatile int a` has been assigned and then later `a` is read by another thread, it is guaranteed to see the appropriate value. I've added more details to my answer. – Gray Jun 17 '18 at 23:33
1

This is more a definition of what will not happen rather than what will happen.

Essentially it is saying that once a write to an atomic variable has happened there cannot be any other thread that, on reading the variable, will read a stale value.

Consider the following situation.

  • Thread A is continuously incrementing an atomic value a.

  • Thread B occasionally reads A.a and exposes that value as a non-atomic b variable.

  • Thread C occasionally reads both A.a and B.b.

Given that a is atomic it is possible to reason that from the point of view of C, b may occasionally be less than a but will never be greater than a.

If a was not atomic no such guarantee could be given. Under certain caching situations it would be quite possible for C to see b progress beyond a at any time.

This is a simplistic demonstration of how the Java memory model allows you to reason about what can and cannot happen in a multi-threaded environment. In real life the potential race conditions between reading and writing to data structures can be much more complex but the reasoning process is the same.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • Most hardware can't do that reordering if you're describing asm store and load instructions, rather than Java assignment operations. PowerPC can in practice: a thread may see a store from another thread before it becomes globally visible to *all* threads. (I wrote a hardware answer about it on a C++ question: [Will two atomic writes to different locations in different threads always be seen in the same order by other threads?](https://stackoverflow.com/a/50679223)) – Peter Cordes Jun 15 '18 at 11:38
  • 1
    You didn't say in what order thread `C` reads `a` and `b` but regardless per the Java memory model `C` could certainly see a larger value for `b` than for `a` since there is no happens-before relationship between the write of `b` and its read: you have a data race. – BeeOnRope Jun 15 '18 at 14:54
  • @BeeOnRope - I've adjusted the wording to clarify (I hope). – OldCurmudgeon Jun 15 '18 at 15:13
  • Well it's still not very clear (you don't mention in what order `C` reads `a` and `b` which could be very important) and you use `atomic` which isn't really a keyword in Java (maybe you are thinking of C++ `std::atomic`?), but I'll assume you are talking about `volatile` when you say atomic. Still, the overall claim is wrong as far as I can tell. _Even with_ `volatile a` you can't really reason about any relationship between `a` and `b` since `b` is written after `a` on thread `B` so there is no happens-before chain involving `b`. `b` could have any value ever written, including > `a`. – BeeOnRope Jun 15 '18 at 17:33