Partial answer: how "unsafe republication" works on OpenJDK today.
(This is not the ultimate general answer I would like to get, but at least it shows what to expect on the most popular Java implementation)
In short, it depends on how the object was published initially:
- if initial publication is done through a volatile variable, then "unsafe republication" is most probably safe, i.e. you will most probably never see the object as partially constructed
- if initial publication is done through a synchronized block, then "unsafe republication" is most probably unsafe, i.e. you will most probably be able to see object as partially constructed
Most probably is because I base my answer on the assembly generated by JIT for my test program, and, since I am not an expert in JIT, it would not surprise me if JIT generated totally different machine code on someone else's computer.
For tests I used OpenJDK 64-Bit Server VM (build 11.0.9+11-alpine-r1, mixed mode) on ARMv8.
ARMv8 was chosen because it has a very relaxed memory model, which requires memory barrier instructions in both publisher and reader threads (unlike x86).
1. Initial publication through a volatile variable: most probably safe
Test java program is like in the question (I only added one more thread to see what assembly code is generated for a volatile write):
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(value = 1,
jvmArgsAppend = {"-Xmx512m", "-server", "-XX:+UnlockDiagnosticVMOptions", "-XX:+PrintAssembly",
"-XX:+PrintInterpreter", "-XX:+PrintNMethods", "-XX:+PrintNativeNMethods",
"-XX:+PrintSignatureHandlers", "-XX:+PrintAdapterHandlers", "-XX:+PrintStubCode",
"-XX:+PrintCompilation", "-XX:+PrintInlining", "-XX:+TraceClassLoading",})
@Warmup(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 5, timeUnit = TimeUnit.SECONDS)
@Threads(4)
public class VolTest {
static class Obj1 {
int f1 = 0;
}
@State(Scope.Group)
public static class State1 {
volatile Obj1 v1 = new Obj1();
Obj1 v2 = new Obj1();
}
@Group @Benchmark @CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void runVolT1(State1 s) {
Obj1 o = new Obj1(); /* 43 */
o.f1 = 1; /* 44 */
s.v1 = o; /* 45 */
}
@Group @Benchmark @CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void runVolT2(State1 s) {
s.v2 = s.v1; /* 52 */
}
@Group @Benchmark @CompilerControl(CompilerControl.Mode.DONT_INLINE)
public int runVolT3(State1 s) {
return s.v1.f1; /* 59 */
}
@Group @Benchmark @CompilerControl(CompilerControl.Mode.DONT_INLINE)
public int runVolT4(State1 s) {
return s.v2.f1; /* 66 */
}
}
Here is the assembly generated by JIT for runVolT3
and runVolT4
:
Compiled method (c1) 26806 529 2 org.sample.VolTest::runVolT3 (8 bytes)
...
[Constants]
# {method} {0x0000fff77cbc4f10} 'runVolT3' '(Lorg/sample/VolTest$State1;)I' in 'org/sample/VolTest'
# this: c_rarg1:c_rarg1
= 'org/sample/VolTest'
# parm0: c_rarg2:c_rarg2
= 'org/sample/VolTest$State1'
...
[Verified Entry Point]
...
;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT3@0 (line 59)
0x0000fff781a60938: dmb ish
0x0000fff781a6093c: ldr w0, [x2, #12] ; implicit exception: dispatches to 0x0000fff781a60984
0x0000fff781a60940: dmb ishld ;*getfield v1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT3@1 (line 59)
0x0000fff781a60944: ldr w0, [x0, #12] ;*getfield f1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT3@4 (line 59)
; implicit exception: dispatches to 0x0000fff781a60990
0x0000fff781a60948: ldp x29, x30, [sp, #48]
0x0000fff781a6094c: add sp, sp, #0x40
0x0000fff781a60950: ldr x8, [x28, #264]
0x0000fff781a60954: ldr wzr, [x8] ; {poll_return}
0x0000fff781a60958: ret
...
Compiled method (c2) 27005 536 4 org.sample.VolTest::runVolT3 (8 bytes)
...
[Constants]
# {method} {0x0000fff77cbc4f10} 'runVolT3' '(Lorg/sample/VolTest$State1;)I' in 'org/sample/VolTest'
# this: c_rarg1:c_rarg1
= 'org/sample/VolTest'
# parm0: c_rarg2:c_rarg2
= 'org/sample/VolTest$State1'
...
[Verified Entry Point]
...
; - org.sample.VolTest::runVolT3@-1 (line 59)
0x0000fff788f692f4: cbz x2, 0x0000fff788f69318
0x0000fff788f692f8: add x10, x2, #0xc
0x0000fff788f692fc: ldar w11, [x10] ;*getfield v1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT3@1 (line 59)
0x0000fff788f69300: ldr w0, [x11, #12] ;*getfield f1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT3@4 (line 59)
; implicit exception: dispatches to 0x0000fff788f69320
0x0000fff788f69304: ldp x29, x30, [sp, #16]
0x0000fff788f69308: add sp, sp, #0x20
0x0000fff788f6930c: ldr x8, [x28, #264]
0x0000fff788f69310: ldr wzr, [x8] ; {poll_return}
0x0000fff788f69314: ret
...
Compiled method (c1) 26670 527 2 org.sample.VolTest::runVolT4 (8 bytes)
...
[Constants]
# {method} {0x0000fff77cbc4ff0} 'runVolT4' '(Lorg/sample/VolTest$State1;)I' in 'org/sample/VolTest'
# this: c_rarg1:c_rarg1
= 'org/sample/VolTest'
# parm0: c_rarg2:c_rarg2
= 'org/sample/VolTest$State1'
...
[Verified Entry Point]
...
;*aload_1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT4@0 (line 66)
0x0000fff781a604b8: ldr w0, [x2, #16] ;*getfield v2 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT4@1 (line 66)
; implicit exception: dispatches to 0x0000fff781a604fc
0x0000fff781a604bc: ldr w0, [x0, #12] ;*getfield f1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT4@4 (line 66)
; implicit exception: dispatches to 0x0000fff781a60508
0x0000fff781a604c0: ldp x29, x30, [sp, #48]
0x0000fff781a604c4: add sp, sp, #0x40
0x0000fff781a604c8: ldr x8, [x28, #264]
0x0000fff781a604cc: ldr wzr, [x8] ; {poll_return}
0x0000fff781a604d0: ret
...
Compiled method (c2) 27497 535 4 org.sample.VolTest::runVolT4 (8 bytes)
...
[Constants]
# {method} {0x0000fff77cbc4ff0} 'runVolT4' '(Lorg/sample/VolTest$State1;)I' in 'org/sample/VolTest'
# this: c_rarg1:c_rarg1
= 'org/sample/VolTest'
# parm0: c_rarg2:c_rarg2
= 'org/sample/VolTest$State1'
...
[Verified Entry Point]
...
; - org.sample.VolTest::runVolT4@-1 (line 66)
0x0000fff788f69674: ldr w11, [x2, #16] ;*getfield v2 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT4@1 (line 66)
; implicit exception: dispatches to 0x0000fff788f69690
0x0000fff788f69678: ldr w0, [x11, #12] ;*getfield f1 {reexecute=0 rethrow=0 return_oop=0}
; - org.sample.VolTest::runVolT4@4 (line 66)
; implicit exception: dispatches to 0x0000fff788f69698
0x0000fff788f6967c: ldp x29, x30, [sp, #16]
0x0000fff788f69680: add sp, sp, #0x20
0x0000fff788f69684: ldr x8, [x28, #264]
0x0000fff788f69688: ldr wzr, [x8] ; {poll_return}
0x0000fff788f6968c: ret
Let's note what barrier instructions the generated assembly contains:
runVolT1
(the assembly isn't shown above because it's too long):
c1
version contains 1x dmb ishst
, 2x dmb ish
c2
version contains 1x dmb ishst
, 1x dmb ish
, 1x stlr
runVolT3
(which reads volatile v1
):
c1
version 1x dmb ish
, 1x dmb ishld
c2
version 1x ldar
runVolT4
(which reads nonvolatile v2
): no memory barriers
As you see, runVolT4
(which reads the object after unsafe republication) doesn't contain memory barriers.
Does it mean that the thread can see the object state as semi-initialized?
Turns out no, on ARMv8 it is safe nonetheless.
Why?
Look at return s.v2.f1;
in the code. Here CPU performs 2 memory reads:
- first it reads
s.v2
, which contains the memory address of object o
- then it reads value of
o.f1
from (memory address of o
) + (offset of field f1
within Obj1
)
The memory address for the o.f1
read is computed from the value returned by the s.v2
read — this is so called "address dependency".
On ARMv8 such address dependency prevents reordering of this two reads (see MP+dmb.sy+addr
example in Modelling the ARMv8 architecture, operationally: concurrency and ISA, you can try it yourself in ARM's Memory Model Tool) — so we are guaranteed to see the v2
as fully initialized.
Memory barrier instructions in runVolT3
serve different purpose: they prevent reordering of the volatile read of s.v1
with other actions within the thread (in Java a volatile read is one of synchronization actions, which must be totally ordered).
More than that, it turns out today on all the supported by OpenJDK architectures address dependency prevents reordering of reads (see "Dependent loads can be reordered" in this table in wiki or "Data dependency orders loads?" in table in The JSR-133 Cookbook for Compiler Writers).
As a result, today on OpenJDK if an object is initially published through a volatile field, then it will most likely be visible as fully initialized even after unsafe republication.
2. Initial publication through a synchronized block: most probably unsafe
Situation is different when initial publication is done through a synchronized block:
class Obj1 {
int f1 = 0;
}
Obj1 v1;
Obj1 v2;
Thread 1 | Thread 2 | Thread 3
--------------------------------------------------------
synchronized { | |
var o = new Obj1(); | |
o.f1 = 1; | |
v1 = o; | |
} | |
| synchronized { |
| var r1 = v1; |
| } |
| v2 = r1; |
| | var r2 = v2.f1;
Is (r2 == 0) possible?
Here the generated assembly for Thread 3
is the same as for runVolT4
above: it contains no memory barrier instructions.
As a result, Thread 3
can easily see writes from Thread 1
out of order.
And generally, unsafe republication in such cases is most probably unsafe today on OpenJDK.