26

I'm re-reading Java Concurrency In Practice, and I'm not sure I fully understand the chapter about immutability and safe publication.

What the book says is:

Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is not used to publish them.

What I don't understand is, why would anyone (interested in making his code correct) publish some reference unsafely?

If the object is immutable, and it's published unsafely, I understand that any other thread obtaining a reference to the object would see its correct state, because of the guarantees offered by proper immutability (with final fields, etc.).

But if the publication is unsafe, another thread might still see null or the previous reference after the publication, instead of the reference to the immutable object, which seems to me like something no-one would like.

And if safe publication is used to make sure the new reference is seen by all the threads, then even if the object is just effectively immutable (no final fields, but no way to mute them), then everything is safe again. As the book says :

Safely published effectively immutable objects can be used safely by any thread without additional synchronization.

So, why is immutability (vs. effective immutability) so important? In what case would an unsafe publication be wanted?

Srini
  • 1,626
  • 2
  • 15
  • 25
JB Nizet
  • 678,734
  • 91
  • 1,224
  • 1,255
  • 2
    I doubt they would, *on purpose*. They might by accident, or they might consider the risk vanishingly small and not care. – Dave Newton Oct 25 '11 at 08:31
  • 3
    I have a hard time understanding your question. *why would anyone (interested in making his code correct) publish some reference unsafely?* -- No one says this is desirable, only that synchronized publication it is not *required* for immutable objects. – aioobe Oct 25 '11 at 08:31
  • 4
    @aioobe : It's still required if you want the reference to the object to be visible to all the other threads, isn't it? If unsafe publication is never desirable, then safe publication is always necessary, and thus proper immutability isn't necessary anymore. Hence my question. – JB Nizet Oct 25 '11 at 08:36
  • No. "Unsafe publication" is safe in this particular case! – curiousguy Oct 25 '11 at 15:37
  • "_But if the publication is unsafe, another thread might still see null or the previous reference after the publication,_" what do you mean by "after the publication"? **After** the publication, the new value is seen, by definition. – curiousguy Oct 25 '11 at 23:54
  • @curiousguy : no, it isn't necessarily, because due to cache issues, another thread might still see another value than the one which has been written. There needs to be a safe publication to avoid that, impying a happens-before relationship. See http://download.oracle.com/javase/6/docs/api/java/util/concurrent/package-summary.html#MemoryVisibility – JB Nizet Oct 26 '11 at 06:02
  • @JBNizet "_another thread might still see another value than the one which has been written_" It means that the other thread doesn't run **after** the publication, **by definition**. – curiousguy Oct 26 '11 at 06:05
  • No. Read the link I gave you. On multicore machines, each thread may see a different value of a single variable, due to cache issues. – JB Nizet Oct 26 '11 at 06:10
  • @JBNizet No. The link you gave isn't the issue. How can you define that some code execute after some other code? – curiousguy Oct 26 '11 at 16:11
  • @curiousguy : Execute the program at http://pastebin.com/AD1KuBqF. On my machine (dual-core), it loops endlessly and never terminates. That's because, although ImmutableHappening is immutable, it's not safely published and the waiter thread doesn't see the new value of the happening attribute. If I declare the happening attribute as volatile, to ensure proper publication (happens-before), then the program stops after 1 second as expected. – JB Nizet Oct 26 '11 at 17:12
  • this old one but the code at pastebin loops not b/c the publication is unsafe but b/c the compiler doesn't generate load for `happening`, it just loads one time and keeps in a CPU register. Then loops forever, on x86 it's difficult to achieve the `not seen` part provided the compiler doesn't optimize. – bestsss Jun 27 '12 at 15:48
  • @bestsss: I don't follow you. Isn't the point of proper publication to make sure that the published object is visible to other threads, by the means of synchronization (and I include volatile as a synchronization mechanism). What does the compiler has to do with this. In my tests with javap, the code is exactly the same except for the volatile field declaration. – JB Nizet Jun 27 '12 at 16:51
  • I mean the generated machine code - the compiler is not javac, but the JIT... In my head I do not consider javac compiler. for example look at the generated assembler: http://stackoverflow.com/a/4635571/554431 The unsafe publication happens only on the system where writes can be reordered, x86 never reorders the writes. – bestsss Jun 27 '12 at 17:02
  • @bestsss: That I can understand. But safe publication is precisely what you must do to make sure that the jitted code writes and then reads from main memory rather than to/from registers when the field is published. The volatile field is what ensures safe publication, hence visibility of the modified value. – JB Nizet Jun 27 '12 at 17:07
  • 2
    Unsafe publication means that the reader may not see all fields properly initialized, those fields cannot be taken from a register as they have never been loaded in the 1st place. Under weak memory model the reference might be made visible, before the fields - that's unsafe publication. My point is that in your test case the endless loop happens b/c the JIT generates a single load only, not that the value is published unsafely (since the field is final it will never happen) - those are 2 different issues. – bestsss Jun 27 '12 at 17:12
  • 2
    @bestsss: Thank you! You made me re-read the chapter with new eyes, and I think I've finally understood: safe publication is not about making a reference to an object visible, but about making sure that if it's visible, then its state also is. And my example is thus not an example of unsafe publication, but an example of (potentially) no publication at all, because the field isn't volatile. But if the memory is flushed some time later, the thread will finally see the reference, and the state of the object, because it's immutable. I understood all the mechanisms, but not the terminology. – JB Nizet Jun 27 '12 at 17:47
  • 1
    *safe publication is not about making a reference to an object visible, but about making sure that if it's visible, then its state also is* -- precisely. – bestsss Jun 27 '12 at 21:51
  • If an important field of an object cannot be declared `final`, but it's known that no reference to the object will be made visible to other threads until construction is complete, what is the lowest-overhead way for a constructor to ensure reliable publication? I think an empty synchronized block would do it, but that seems a bit overkill. Is there any faster alternative? – supercat Feb 09 '14 at 16:49
  • It’s interesting that no-one mentioned the strongest reason for the introduction of that guaranty: the entire security concept relies on the immutability of the `String` class as permissions, codebases and principals, etc are expressed in form of `String`s. Hence, it’s important that this immutability is guaranteed, even if an attacker deliberately publishes `String` instances in a thread-unsafe way. – Holger Jul 17 '15 at 19:38

4 Answers4

9

It is desirable to design objects that don't need synchronization for two reasons:

  1. The users of your objects can forget to synchronize.
  2. Even though the overhead is very little, synchronization is not free, especially if your objects are not used often and by many different threads.

Because the above reasons are very important, it is better to learn the sometimes difficult rules and as a writer, make safe objects that don't require synchronization rather than hoping all the users of your code will remember to use it correctly.

Also remember that the author is not saying the object is unsafely published, it is safely published without synchronization.

As for your second question, I just checked, and the book does not promise you that another thread will always see the reference to the updated object, just that if it does, it will see a complete object. But I can imagine that if it is published through the constructor of another (Runnable?) object, it will be sweet. That does help with explaining all cases though.

EDIT:

effectively immutable and immutable The difference between effectively immutable and immutable is that in the first case you still need to publish the objects in a safe way. For the truly immutable objects this isn't needed. So truly immutable objects are preferred because they are easier to publish for the reasons I stated above.

Srini
  • 1,626
  • 2
  • 15
  • 25
Thirler
  • 20,239
  • 14
  • 63
  • 92
  • I agree with that. But the question is : why is *proper* immutability (with final fields) preferrable over *effective* immutability (no final fields, but no possible modification)? Indeed, safe publication of effectively immutable objects is sufficient. It looks like belt and suspenders to me. Not that it isn't a good thing to have belt and suspenders, but I want to make sure I haven't missed anything. – JB Nizet Oct 25 '11 at 10:45
  • I updated the answer to help with this question. Of course in many case both will achieve the same. I think much of the effects you mention are the result of made choices and not choices in their own. So giving you two options is not necessarily by design. – Thirler Oct 25 '11 at 11:35
  • Thanks for your answer. I think we agree on everything in fact. I just can't imagine a use-case where unsynchronized publication might be acceptable. When would we choose such a publication? Is there some use-case where it would not cause a bug? – JB Nizet Oct 25 '11 at 11:46
  • 1
    Good point, at least it is usable when you are replacing an old value and you don't mind that some objects will still look at the new value. Another is when you use the object in other immutable objects or before threads are constructed. I think in general these unsafe publications will rely on another effect to make sure the other threads see it. – Thirler Oct 25 '11 at 12:12
5

So, why is immutability (vs. effective immutability) so important?

I think the main point is that truly immutable objects are harder to break later on. If you've declared a field final, then it's final, period. You would have to remove the final in order to change that field, and that should ring an alarm. But if you've initially left the final out, someone could carelessly just add some code that changes the field, and boom - you're screwed - with only some added code (possibly in a subclass), no modification to existing code.

I would also assume that explicit immutability enables the (JIT) compiler to do some optimizations that would otherwise be hard or impossible to justify. For example, when using volatile fields, the runtime must guarantee a happens-before relation with writing and reading threads. In practice this may require memory barriers, disabling out-of-order execution optimizations, etc. - that is, a performance hit. But if the object is (deeply) immutable (contains only final references to other immutable objects), the requirement can be relaxed without breaking anything: the happens-before relation needs to be guaranteed only with writing and reading the one single reference, not the whole object graph.

So, explicit immutability makes the program simpler so that it's both easier for humans to reason and maintain and easier for the computer to execute optimally. These benefits grow exponentially as the object graph grows, i.e. objects contain objects that contain objects - it's all simple if everything is immutable. When mutability is needed, localizing it to strictly defined places and keeping everything else immutable still gives lots of these benefits.

Joonas Pulakka
  • 36,252
  • 29
  • 106
  • 169
  • 1
    I think this is in fact the answer. Publishing to a volatile field is a safe publication, though, because it establishes a happens-before relation with other threads. – JB Nizet Oct 25 '11 at 12:16
2

"Unsafe publication" is often appropriate in cases where having other threads see the latest value written to a field would be desirable, but having threads see an earlier value would be relatively harmless. A prime example is the cached hash value for String. The first time hashCode() is called on a String, it will compute a value and cache it. If another thread which calls hashCode() on the same string can see the value computed by the first thread, it won't have to recompute the hash value (thus saving time), but nothing bad will happen if the second thread doesn't see the hash value. It will simply end up performing a redundant-but-harmless computation which could have been avoided. Having hashCode() publish the hash value safely would have been possible, but the occasional redundant hash computations are much cheaper than the synchronization required for safe publication. Indeed, except on rather long strings, synchronization costs would probably negate any benefit from caching.

Unfortunately, I don't think the creators of Java imagined situations where code would write to a field and prefer that it should be visible to other threads, but not mind too much if it isn't, and where the reference stored to the field would in turn identify another object with a similar field. This leads to situations writing semantically-correct code is much more cumbersome and likely slower than code which would be likely to work but whose semantics would not be guaranteed. I don't know any really good remedy for that in some cases other than using some gratuitous final fields to ensure that things get properly "published".

supercat
  • 77,689
  • 9
  • 166
  • 211
2

I had the exact same question as the original poster when finishing reading chapters 1-3 . I think the authors could have done a better job elaborating on this a bit more.

I think the difference lies therein that the internal state of effectively immutable objects can be observed to be in an inconsistent state when they are not safely published whereas the internal state of immutable objects can never be observed to be in an inconsistent state.

However I do think the reference to an immutable object can be observed to be out of date / stale if the reference is not safely published.

N.N.
  • 8,336
  • 12
  • 54
  • 94
user698226
  • 71
  • 1
  • 2
  • I don't know anything about safe publication of a reference. I think the point is that the reference itself is mutable, and one thread may hold an outdated reference to an immutable object and it probably is fine for many scenarios. – Adrian Liu Feb 05 '17 at 06:20