Java Memory Model and Concurrency

Question

Given the x86 total store order and the happens-before relationship in the Java Memory Model, we know the compiler doesn't guarantee the order of execution of the instructions. It can reorder as it sees fit, in order to improve performance. Given that, we have:

EAX, EBX are names of registers
[x], [y] are memory locations
r1 and r2 are names of local variables
x, y are shared variables accessible to all threads. All variables are 32-bit integers.
No, it's NOT A HOMEWORK QUESTION.

So I have two sets of problems I'm trying to determine the possible outputs:

[x] == [y] == 0 // the address space of [x] and [y] are 0.

// Thread 1                         Thread 2
MOV [x] <- 1                        MOV [y] <- 1
MOV EAX <- [y]                      MOV EBX <- [x]

Which are the possible values for the registers EBX and EAX?

int x = 0;
int y = 0;

// Thread 1                         Thread 2
x = 1;                              y = 1; 
r1 = y;                             r2 = x;

What the possible values for r1 and r2?

IMO, you should separate x86 architecture from Java. Either this is a question about what a JVM is _allowed_ to do, or it is a question about what an x86 processor actually might do. As for what a JVM is _allowed_ to do, thread 1 could assign either 0 or 1 to `r1`, and thread 2 could assign either 0 or 1 to r2. — Solomon Slow, Nov 29 '15 at 20:42

score 4 · Accepted Answer · edited Nov 29 '15 at 21:44

4

Writing a 32-bit integer is guaranteed to be atomic by the JVM, so this is not an issue.

You have 2 variables x and y shared between threads without synchronization .

Thread1 mutates x and reads y.
Thread2 mutates y and reads x.

Therefore, thread1 could see a stale value of y (1 or 0), and thread2 could see stale value x (1,0).

This means you can get all four possible combinations of (eax, ebx): (0,0) (0,1) (1,0) (1,1)

edited Nov 29 '15 at 21:44

Voo

29,040
11
82
156

answered Nov 29 '15 at 19:21

Sleiman Jneidi

22,907
14
56
77

1

The content of your answer is good, but I feel that the use of `code formatting` for ordinary words is unnecessary... – Nayuki Nov 29 '15 at 19:37
@NayukiMinase I am not a SO expert, I use "code formatting" to highlight important words, feel free to edit it :) – Sleiman Jneidi Nov 29 '15 at 19:38
1

Not an expert with a score like yours? Humble indeed. Anyway since you replied, I'll leave the editing to your judgement. – Nayuki Nov 29 '15 at 19:40
1

If `atomic` was a Java keyword, it would make sense. Otherwise I think it's weird. I sometimes use **bold** if I want to highlight the important part of a long rambling paragraph I wrote. – Peter Cordes Nov 29 '15 at 19:55
(0,0) is just as possible as the other three options. – Voo Nov 29 '15 at 21:23
@Voo what three options? I said (1 or 0) for x and y, I didn't specify options or combinations – Sleiman Jneidi Nov 29 '15 at 21:36
@Sleiman ah I read that as a tuple of options. If you would make an insubstantial edit so that I can remove my vote? – Voo Nov 29 '15 at 21:38
@Voo I thought about it and didn't find a better a way to word it, I would appreciate it if you contribute to it :) and your vote is yours :) – Sleiman Jneidi Nov 29 '15 at 21:42

score 4 · Answer 2 · edited May 23 '17 at 12:04

x86 has a strongly ordered memory model, but does still allow StoreLoad reordering.

Jeff Preshing's blog post: Memory Reordering Caught in the Act, uses exactly that pair of store-then-load sequences as a test case to prove that reordering really can be observed on real hardware. He has source code and everything.

Note that each thread has its own architectural state (including all the registers). So thread1's EAX is different from thread2's EAX. Using EBX in thread2 only makes it easier to talk about, not any different from a what-can-happen POV.

Anyway, both registers can end up with 0. This rarely happens, but it can, because each thread's store can be delayed (in a store buffer or whatever) until after the other thread's load has chosen a value. Having this be legal lets the CPU aggressively use prefetched data to satisfy loads, and to buffer stores so they may not become globally visible right away when they retire. ("retire" means the architectural state (including EIP) of the thread running the instruction has moved on to the next instruction, and the effects are committed.)

The other possibilities, once the dust settles, always include both globals being 1. All 4 possible values of zero and one in each thread's register are possible, including both 1. It's possible for them to see each other's stores. I'm not sure how likely this is; it might require one thread being interrupted after its store but before its load. If both threads are running on the same physical core (hyperthreading), this possibility is much more likely.

Even if the storage for x and y is unaligned and crosses a cache line, 0 and 1 are the only possible values. (C compiler output, and JVMs, will align variables to their natural alignment, making this a non-issue, but you can do anything you want in asm so I thought I'd mention it.) This happens because the two values differ only in the least significant byte.

If you were storing a 32bit -1 to 4 bytes that span two cache lines, the other thread could load a value of 0x00ffffff or 0xff000000, 0x0000ffff or 0xffff0000, etc. (depending on where the cache-line boundary was), as well as the usual 0 or 0xffffffff (aka -1).

re: Java. I haven't read up on the Java memory model. Other answers are saying it even allows compile-time reordering (like c++11's std::atomic rules). Even if not, without a full memory barrier, StoreLoad reordering can happen. So all four results are possible.

This is true even if your JVM is running on an x86 CPU (rather than weakly-ordered hardware like ARM).

This answer to another question may shed some light on why LFENCE/SFENCE exist on x86, even though they are no-ops in most cases. (i.e. when not using movnt or weakly-ordered memory regions (like USWC video memory)).

Or, just read Jeff Preshing's other blog posts to learn more about memory ordering. I found it really helpful myself.

Nicholas Pipitone · Answer 3 · 2015-11-29T19:36:36.777

2

We can simply label statements as below:

A) [x] <- 1            C) [y] <- 1

B) EAX <- [y]           D) EBX <- [x]

We know that A comes before B, and C comes before D, so just insert C and D into AB in all of the possible permutations:

CDAB
CADB
CABD
ACDB
ACBD
ABCD

And consider the implications of each possibility, noting that the majority start with either AC or CA, outputting (EAX,EBX)=(1,1) since the assignments are happening before EAX and EBX are being set. All that's left is to check the other two possibilities. CDAB gives (EAX,EBX)=(1,0), and ABCD gives (EAX,EBX)=(0,1).

For the Java version, you state that the compiler does not guarantee the order of the statements executed. In that case, it shouldn't be difficult to order A, B, C, and D to get (0,0), (1,0), (0,1), and (1,1).

edited Nov 29 '15 at 19:36

answered Nov 29 '15 at 19:12

Nicholas Pipitone

4,002
4
24
39

How about the first problem? – cybertextron Nov 29 '15 at 19:15
@philippe The first problem is the same question, just written in assembly. The problem writer I assume used r1 to signify "register 1", or EAX, and r2 for "register 2", or EBX. – Nicholas Pipitone Nov 29 '15 at 19:17
Do we know that A comes before B? I don't think Java makes that guarantee. A and B are independent from the standpoint of thread 1, and in that case it's afaik ok to reorder instructions. – zapl Nov 29 '15 at 19:30
1

A comes before B is misleading - this does not have to be the case from the perspective of the other threads... – assylias Nov 29 '15 at 19:37
2

I suppose in Java it can reorder during compilation, but I will assume that the assembler being used will not do such a thing and I have updated my answer accordingly. – Nicholas Pipitone Nov 29 '15 at 19:39
@assylias: In the x86 asm version, B and D load from memory into registers. Register state is local to a thread, and *never* becomes globally visible. Other threads can only ever see the A and C stores. Within a thread, instructions are always seen to execute in program order (regardless of what the out-of-order machinery does under the hood. Keeping track of things so they appear in program order is part of what makes it so complicated, of course. :) – Peter Cordes Nov 29 '15 at 20:01

Java Memory Model and Concurrency

3 Answers3