Java final fields: is "taint" behavior possible with the current JLS

Question

I'm currently trying to understand this JLS section on final fields.

To understand the text in the JLS better I'm also reading The Java Memory Model by Jeremy Manson (one of creators of the JMM).

The paper contains the example that got me interested: if an object o with final fields is made visible to another thread t twice:

first "improperly" before o's constructor finishes
next "properly" after o's constructor finishes

then t can see semi-constructed o even when it is accessed only via a "properly" published path.

Here is the part from the paper:

Figure 7.3: Example of Simple Final Semantics

f1 is a final field; its default value is 0
Thread 1 Thread 2 Thread 3
o.f1 = 42;
p = o;
freeze o.f1;
q = o;
r1 = p;
i = r1.f1;
r2 = q;
if (r2 == r1)
    k = r2.f1;
r3 = q;
j = r3.f1;
We assume r1, r2 and r3 do not see the value null. i and k can be 0 or 42, and j must be 42.

Consider Figure 7.3. We will not start out with the complications of multiple writes to final fields; a freeze, for the moment, is simply what happens at the end of a constructor. Although r1, r2 and r3 can see the value null, we will not concern ourselves with that; that just leads to a null pointer exception.

...

What about the read of q.f1 in Thread 2? Is that guaranteed to see the correct value for the final field? A compiler could determine that p and q point to the same object, and therefore reuse the same value for both p.f1 and q.f1 for that thread. We want to allow the compiler to remove redundant reads of final fields wherever possible, so we allow k to see the value 0.

One way to conceptualize this is by thinking of an object being “tainted’ for a thread if that thread reads an incorrectly published reference to the object. If an object is tainted for a thread, the thread is never guaranteed to see the object’s correctly constructed final fields. More generally, if a thread t reads an incorrectly published reference to an object o, thread t forever sees a tainted version of o without any guarantees of seeing the correct value for the final fields of o.

I tried to find in the current JLS anything that explicitly allows or forbids such behavior, but all I found is that:

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

Is such behavior allowed in the current JLS?

I don't get it. you found the quote that _enforces_ a correct behavior, anything else means the rule is not respected. That directly answers your question. — Eugene, Jan 28 '21 at 04:43
I can see a few typos that makes reading this document a bit complicated. shouldn't `i = r.f1;` be really be `i = r1.f1;` and `if (r2 == r)` be if `(r2 == r1)`?. and also `freeze o.f` - what is `f`? it should be `f1`?. I also assume that by "freeze" he means proper memory barriers? Then : "What about the read of `q.f1` in Thread 2", it's not `q.f1`, it is `r2.f1`. — Eugene, Jan 28 '21 at 19:36
@Eugene I fixed the typos. A "freeze action" is [from the JLS](https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.5.1). `q.f1` is fine IMO: it means that we access `o.f1` via a shared variable `q` (just like in [JLS](https://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4) here `r1,r2,r3` are local variables and `i,j,k,p,q` are shared variables). — , Jan 28 '21 at 20:34
I am not saying it is not fine `q.f1`, but _where_ do you actually see it? The real read is `r2.f1` — Eugene, Jan 29 '21 at 03:49
@Eugene _What about the read of q.f1 in Thread 2?_ is a quote from [The Java Memory Model by Jeremy Manson](https://drum.lib.umd.edu/bitstream/handle/1903/1949/umi-umd-1898.pdf). — , Jan 29 '21 at 04:43

aran · Answer 1 · 2021-02-21T23:22:10.557

Yes, it is allowed.

Mainly exposed on the already quoted sections of the JMM:

Assuming the object is constructed "correctly", once an object is constructed, the values assigned to the final fields in the constructor will be visible to all other threads without synchronization.

What does it mean for an object to be properly constructed? It simply means that no reference to the object being constructed is allowed to "escape" during construction.

In other words, do not place a reference to the object being constructed anywhere where another thread might be able to see it; do not assign it to a static field, do not register it as a listener with any other object, and so on. These tasks should be done after the constructor completes, not in the constructor** *

So yes, it's possible, as far as is allowed. Last paragraph is full of suggestions of how-not-to-do things; Whenever someone says avoid doing X, then is implicit that X can be done.

What if... `reflection`

The other answers correctly point out the requirements for the final fields to be correctly seen by other threads, such as the freeze at the end of the constructor, the chain, and so on. These answers offer a deeper understanding of the main issue and should be read first. This one focuses on a possible exception to these rules.

The most repeated rule/phrase may be this one here, copied from Eugene's answer (which shouldn't have any negative vote btw):

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly [assigned/loaded/set] values for that object's final fields.

Note that I changed the term "initialized" with the equivalent terms assigned, loaded, or set. This is in purpose, as the terminology may mislead my point here.

Another proper statement is the one from chrylis -cautiouslyoptimistic-:

The "final freeze" happens at the end of the constructor, and from that point on all reads are guaranteed to be accurate.

JLS 17.5 final Field Semantics state that:

A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

But, do you think reflection gives a f*** about this? No, of course not. It didn't even read that paragraph.

Subsequent Modification of final Fields

These statements are not only correct, but also backed by the JLS. I don't intend to refute them, but just add some little extra information regarding an exception to this law: reflection. That mechanism that, among other things, can change a final field's value after being initialized.

Freeze of a final field occurs at the end of the constructor in which the final field is set, that's completely true. But there's another trigger for the freeze operation that hasn't been taken into account: Freeze of a final field also occurs initializing/modifying a field via reflection (JLS 17.5.3):

Freezes of a final field occur both at the end of the constructor in which the final field is set, and immediately after each modification of a final field via reflection.

Reflective operations on final fields "break" the rule: after the constructor being properly finished, all reads of the final fields are still NOT guaranteed to be accurate. I'd try to explain.

Let's imagine all the proper flow has been honored, the constructor's been initialized and all final fields from an instance are correctly seen by a thread. Now it's time to make some changes on those fields via reflection (just imagine this is needed, even if unusual, I know..).

The previous rules are followed and all threads wait until all fields have been updated: just as with the usual constructor scenario, the fields are only accessed after being freezed and the reflective operation been correctly finished. This is where the law is broken:

If a final field is initialized to a constant expression (§15.28) in the field declaration, changes to the final field may not be observed, since uses of that final field are replaced at compile time with the value of the constant expression.

This is telling: even if all rules were followed, your code won't correctly read the final field's assigned value, if that variable is a primitive or String and you initialized it as a constant expression in the fields declaration. Why? Because that variable is just a hardcoded value for your compiler, which won't ever check again that field nor its changes, even if your code properly updated the value in runtime execution.

So, let's test it:

 public class FinalGuarantee 
 {          
      private final int  i = 5;  //initialized as constant expression
      private final long l;

      public FinalGuarantee() 
      {
         l = 1L;
      }
        
      public static void touch(FinalGuarantee f) throws Exception
      {
         Class<FinalGuarantee> rfkClass = FinalGuarantee.class;
         Field field = rfkClass.getDeclaredField("i");
         field.setAccessible(true);
         field.set(f,555);                      //set i to 555
         field = rfkClass.getDeclaredField("l");
         field.setAccessible(true);
         field.set(f,111L);                     //set l to 111                 
      }
      
      public static void main(String[] args) throws Exception 
      {
         FinalGuarantee f = new FinalGuarantee();
         System.out.println(f.i);
         System.out.println(f.l);
         touch(f);
         System.out.println("-");
         System.out.println(f.i);
         System.out.println(f.l);
      }    
 }

Output:

The final int i was correctly updated at runtime, and to check it, you could debug and inspect the object's fields values:

Both i and l were correctly updated. So what's happening with i, why is still showing 5? Because as stated on the JLS, the field i is replaced directly at compile time with the value of the constant expression, which in this case, is 5.

Every consequent read of the final field i will then be INCORRECT, even if all previous rules were followed. The compiler will never check again that field: When you code f.i, it won't access any variable of any instance. It will just return 5: the final field is just hardcoded at compile-time and if an update is made on it on runtime, it will never, ever be correctly seen again by any thread. This breaks the law.

As proof of the correct update of the fields at runtime:

Both 555 and 111L are pushed into the stack and the fields get their newly assigned values. But what happens when manipulating them, such as printing their value?

l was not initialized to a constant expression nor in the field declaration. As a result, isn't affected by 17.5.3 's rule. The field is correctly updated and read from outer threads.
i , however, was initialized to a constant expression in the field declaration. After the initial freeze, there's no more f.i for the compiler, that field will never be accessed again. Even if the variable is correctly updated to 555 in the example, every try to read from the field has been replaced by the harcoded constant 5; regardless any further change/update made on the variable, it will always return five.

16: before the update
42: after the update

No field access, but just a "yeah that's 5 for sure, return it". This implies that a final field is not ALWAYS guaranteed to be correctly seen from outer threads, even if all protocols were followed.

This affects primitives and Strings. I know it's an unusual scenario, but it's still a possible one.

Some other problematic scenarios (some also related to the synchronize issue quoted on the comments):

1- If not correctly synchronized with the reflective operation, a thread could fall into a race condition in the following scenario:

    final boolean flag;  // false in constructor
    final int x;         // 1 in constructor

Let's assume the reflection operation will, in this order:

  1- Set flag to true
  2- Set x to 100.

Simplification of the reader thread's code:

    while (!instance.flag)  //flag changes to true
       Thread.sleep(1);
    System.out.println(instance.x); // 1 or 100 ?

As a possible scenario, the reflective operation didn't have enough time to update x, so the final int x field may or not be correctly read.

2- A thread could fall into a deadlock in the following scenario:

    final boolean flag;  // false in constructor

Let's assume the reflection operation will:

  1- Set flag to true

Simplification of the reader thread's code:

    while (!instance.flag) { /*deadlocked here*/ } 

    /*flag changes to true, but the thread started to check too early.
     Compiler optimization could assume flag won't ever change
     so this thread won't ever see the updated value. */

I know this is not a specific issue for final fields, but just added as a possible scenario of incorrect read flow of these type of variables. These last two scenarios would just be a consequence of incorrect implementations but wanted to point them out.

Thank you. I guess the information at https://www.cs.umd.edu/~pugh/java/memoryModel/ should be valid for at least the 1st version of JMM. And JMM hasn't changed much (or even at all) since the 1st version. So it should be valid for the current JLS. — , Jan 28 '21 at 01:03
@jyoxbffz I believe it too as well; Anyway, If i find some inconsistency within the newst versions, will update this. Hope it was helpful mate — aran, Jan 28 '21 at 01:08
I would like a citation for the claim that "synchronization is still needed". In particular, all of the construction activity _happens-before_ the constructor finishes. — chrylis -cautiouslyoptimistic-, Jan 28 '21 at 01:23
There's not an explicit "synch is needed" quote, but just suggestions when accessing the final elements from outside. *after a thread constructs an immutable object (that is, an object that only contains final fields), you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization. There is no other way to ensure, for example, that the reference to the immutable object will be seen by the second thread* -- Typically still need to use synchronization is somehow different than *is needed* (as I see it, at least) — aran, Jan 28 '21 at 01:27
I believe the suggestions refer to certain specific use cases, such as the registration into a listener, and so on, specially when the constructor didn't finish and/or into the constructor's scope. — aran, Jan 28 '21 at 01:28
_you want to ensure that it is seen correctly by all of the other thread, you still typically need to use synchronization_ that is about the reference, nothing to do with the fields visibility. I am genuinely interested also in : "Above all suggestions, just make sure the constructor finished its job before any final field's manipulation/assignation/read". How do you imagine this happening? — Eugene, Jan 28 '21 at 04:44
@Eugene you may be able to manage the execution flow from code, in where you won't need any synchronization as it's implicit in the creation of the objects, if properly implemented. You can, of course, apply a `synchronization` mechanism that involves sempahores, Monitors, and so on. — aran, Jan 28 '21 at 04:49
@Eugene regarding the first part, I was also talking about the visibility, in which, still, I said yes, the field was allowed to be accessed. The reference just points some recommendations in order not to fall into a read from a final variable not properly initialized and publicly accessible that will be incorrect and won't probably be updated by the reader — aran, Jan 28 '21 at 04:50
@aran Very interesting. You've basically tested in practice Section [17.5.3. Subsequent Modification of final Fields](https://docs.oracle.com/javase/specs/jls/se15/html/jls-17.html#jls-17.5.3) of the JLS and discovered that the "constant expression complication", which is mentioned there, happens in a modern JVM. — , Jan 30 '21 at 22:34
@jyoxbffz I know this doesn't focus as much on your question as the other given answers, but hopes it gives some extra context, or at least to be interesting to read. Thanks for your words : ) — aran, Jan 30 '21 at 23:17
a compile time constant has [other interesting angles](https://stackoverflow.com/questions/65768419/serialization-deserialization-of-the-final-transient-fields/65768875#65768875). And reflectively changing such a constant was never an option, btw. — Eugene, Jan 31 '21 at 03:28
another aspect is that while `static final Integer me = Integer.parseInt("2");` is _not_ a compile time constant according to the `JLS`, but it will be constant folded by `JIT`; and even if it is not `static`, there is a way to make it a constant, by instructing the `JVM`. — Eugene, Jan 31 '21 at 03:30
one more point is this code, for example: `int x = 1; int test(){ int left = this.x; this.setX(2); return this.x - i; }` what are the possible values of calling `test()` (in a single threaded world)? Only `1`. Now add `final` to that `x`, what are the possibilities now? `JLS` allows `1, 0, -1`. — Eugene, Jan 31 '21 at 03:35
this was once exploited (via a flag) by `Shenandoah GC` by not inserting any GC barriers around `final` fields, but this has first been removed (because how many programs obey to the final field semantics?) and second - it has become obsolete when load reference barriers appeared in `Shenandoah 2.0` and I'll probably make this my last comment here, it's too much already. — Eugene, Jan 31 '21 at 03:39

score 5 · Accepted Answer · 2021-01-29T05:02:22.740

Yes, such behavior is allowed.

Turns out that a detailed explanation of this same case is available on the personal page of William Pugh (yet another JMM author): New presentation/description of the semantics of final fields.

Short version:

section 17.5.1. Semantics of final Fields of JLS defines special rules for final fields.
The rules basically lets us establish an additional happens-before relation between the initialization of a final field in a constructor and a read of the field in another thread, even if the object is published via a data race.
This additional happens-before relation requires that every path from the field initialization to its read in another thread included a special chain of actions:
```
w  ʰᵇ ► f  ʰᵇ ► a  ᵐᶜ ► r₁  ᵈᶜ ► r₂, where:
```
- w is a write to the final field in a constructor
- f is "freeze action", which happens when constructor exits
- a is a publication of the object (e.g. saving it to a shared variable)
- r₁ is a read of the object's address in a different thread
- r₂ is a read of the final field in the same thread as r₁.
the code in the question has a path from o.f1 = 42 to k = r2.f1; which doesn't include the required freeze o.f action:
```
o.f1 = 42  ʰᵇ ► { freeze o.f is missing }  ʰᵇ ► p = o  ᵐᶜ ► r1 = p  ᵈᶜ ► k = r2.f1
```
As a result, o.f1 = 42 and k = r2.f1 are not ordered with happens-before ⇒ we have a data race and k = r2.f1 can read 0 or 42.

A quote from New presentation/description of the semantics of final fields:

In order to determine if a read of a final field is guaranteed to see the initialized value of that field, you must determine that there is no way to construct the partial orders ᵐᶜ ► and ᵈᶜ ► without providing the chain w ʰᵇ ► f ʰᵇ ► a ᵐᶜ ► r₁ ᵈᶜ ► r₂ from the write of the field to the read of that field.

...

The write in Thread 1 and read in Thread 2 of p are involved in a memory chain. The write in Thread 1 and read in Thread 2 of q are also involved in a memory chain. Both reads of f see the same variable. There can be a dereference chain from the reads of f to either the read of p or the read of q, because those reads see the same address. If the dereference chain is from the read of p, then there is no guarantee that r5 will see the value 42.

Notice that for Thread 2, the deference chain orders r2 = p ᵈᶜ ► r5 = r4.f, but does not order r4 = q ᵈᶜ ► r5 = r4.f. This reflects the fact that the compiler is allowed to move any read of a final field of an object o to immediately after the the very first read of the address of o within that thread.

score 1 · Answer 3 · answered Jan 28 '21 at 01:29

The behavior is permitted by this clause in 17.5:

compilers are allowed to keep the value of a final field cached in a register and not reload it from memory in situations where a non-final field would have to be reloaded

The "final freeze" happens at the end of the constructor, and from that point on all reads are guaranteed to be accurate. However, if the object is published unsafely, then another thread could (1) read field o, which is uninitialized, and (2) also assume that because o is final, it can never change, and so permanently cache that value without re-reading it.

Thank you. But IMHO the phrase _"in situations where a non-final field would have to be reloaded"_ is pretty vague. In particular, I don't think it can be applied to the "final freeze" action. — , Jan 28 '21 at 02:10

Eugene · Answer 4 · 2021-01-29T05:36:29.537

Stop. Quoting. JMM.

JMM is not for me and you, it's for people that really know what they are doing, like JVM compiler writers. Are you one of them? Am I one of them? I don't think so, thus stay away from it. There, I've said it.

It's rather interesting that you answered this question yourself, via the correct quote in the JLS:

An object is considered to be completely initialized when its constructor finishes. A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.

That's it. It explicitly says what is correct and what can be an expected result. Everything else, is not documented, thus undefined, thus "welcome to unknown territory. have a nice day". So yes, it is possible simply by excluding what is impossible (or guaranteed by the JLS).

EDIT

Let's go, this is going to be long. We need to look at a certain rule from JLS here:

Given a write w, a freeze f, an action a (that is not a read of a final field), a read r1 of the final field frozen by f, and a read r2 such that hb(w, f), hb(f, a), mc(a, r1), and dereferences(r1, r2), then when determining which values can be seen by r2, we consider hb(w, r2)

It is a lot, but should slowly make sense as we go. I admit that I have not done this exercise ever with final fields.

I'll start with Thread 1 and Thread 3. It should be obvious that all these actions in Thread 1 form a happens-before chain, because of the obvious "program order":

o.f1 = 42;
p = o;
freeze o.f1;
q = o;

so we have :

   (hb)                   (hb)
w ------> freeze, freeze ------> q

If you look at the quote above, we fulfill two conditions : hb(w, f) and hb(f, a), i.e.: we have the write (w) via o.f1 = 42, the freeze via freeze o.f1 and also the second condition is fulfilled (hb(f, a)) via q = o.

What we need to establish next is mc(a, r1). For that we need to involved Thread 3, which does:

r3 = q;
j = r3.f1;

As such, we can say that "action a" (from the same quote) is a write and r1 (from mc(a, r1)) is a read, via r3 = q;. The same chapter says about memory chain:

If r is a read that sees a write w, then it must be the case that mc(w, r).

which perfectly matches our description above. As such, until now we have:

      (hb)                       (hb)
   w ------> freeze --> freeze ------> q --> mc(w, r1).

Now we need to look at that dereferences(r1, r2). We go yet again to the same chapter:

Dereference Chain: If an action a is a read or write of a field or element of an object o by a thread t that did not initialize...

Did Thread 3 initialize q? No (which is good). If you read the second half of this quote (in my understanding at least), we have fulfilled this rule also. Thus:

      (hb)          (hb)     (mc)       (dereferences)
   w ------> freeze -----> a ------> r1 ----------------> r2

As such (according to the same initial quote):

   hb(w, r2).

Which reads as "no data races are possible". So the only thing that Thread 3 can read is 42, because a read either sees the latest write in happens before order, or any other write.

If you extrapolate this to Thread 1 and Thread 2, you immediately see that freeze action is missing - you can't even start to build such a chain. As such : a data race, as such it can read any other value. But actually it can read either 0 or 42, because java does not allow "out of thin air" values.

_JMM is not for me and you, it's for people that really know what they are doing, like JVM compiler writers. Are you one of them? Am I one of them? I don't think so, thus stay away from it. There, I've said it._. I feel your pain, believe me. JMM is way too complicated and even its authors admit that (C/C++ memory models are much simpler). But since I use java for software development I want to know what to expect even when there are bugs in the code (e.g. when code is incorrectly synchronized). And the most reliable source for that is java specs. — , Jan 28 '21 at 05:13
regarding the quote — it is doesn't answer my question. The quote is about the guarantees the JMM gives us when the object is published after its constructor finishes. But my question was about the specific example in which one "incorrect" publication before constructor finishes causes all later "correct" publications to see the object as semi-constructed. — , Jan 28 '21 at 05:26
@Eugene While I agree with you in principle, having worked extensively with legacy code (as I think most of us have) and seen the horrors and misunderstandings I actually find it a good thing to be reasonably knowledgeable about JLS and JMM. — Erik, Jan 28 '21 at 09:04
@Erik agree. It's not just "reasonable"; the more I know and read about `JLS` and `JMM` the harder it gets to sleep with the horrors that I know exist in our code base. At this point, I am fairly convinced that the vast majority of devs should only operate against known best practices put up by reputable engineers. — Eugene, Jan 29 '21 at 03:25
@jyoxbffz after re-reading your question and those papers a couple of times, I stand corrected. This is a fabulous question that I did not pay attention enough, for which I feel bad. My apology comes as an edit of the answer that I gave. — Eugene, Jan 29 '21 at 05:31
@Eugene Your progress is visible. _"If you extrapolate this to Thread 1 and Thread 2, you immediately see that freeze action is missing - you can't even start to build such a chain."_ as I understand from [New presentation/description of the semantics of final fields](http://www.cs.umd.edu/%7Epugh/java/memoryModel/may-12.pdf) you actually can build the right chain. The problem is that you can also build a "invalid" chain without "freeze" — this is what prevents happens-before ordering between w and r2. — , Jan 29 '21 at 06:34
@jyoxbffz agree. you need to find at least one path that proves such a possibility. — Eugene, Jan 29 '21 at 14:03

Java final fields: is "taint" behavior possible with the current JLS

4 Answers4

What if... `reflection`

Linked

Java final fields: is "taint" behavior possible with the current JLS

4 Answers4

What if... reflection

Linked

What if... `reflection`