The main problem concurrent updates of mutable data, is that threads may perceive variable values stemming from different versions, i.e. a mixture of old and new values when speaking of a single update, forming an inconsistent state, violating the invariants of these variables.
See, for example, Java’s ArrayList
. It has an int
field holding the current size and a reference to an array whose elements are references to the contained objects. The values of these variables have to fulfill certain invariants, e.g. if the size is non-zero, the array reference is never null
and the array length is always greater or equal to the size. When seeing values of different updates for these variables, these invariants do not hold anymore, so threads may see a list contents which never existed in this form or fail with spurious exceptions, reporting an illegal state that should be impossible (like NullPointerException
or ArrayIndexOutOfBoundeException
).
Note that thread safe or concurrent data structures only solve the problem regarding the internals of the data structure, so operations do not fail with spurious exceptions anymore (regarding the collection’s state, we’ve not talked about the contained element’s state yet), but operations iterating over these collections or looking at more than one contained element in any form, are still subject to possibly observing an inconsistent state regarding the contained elements. This also applies to the check-then-act anti-pattern, where an application first checks for a condition (e.g. using contains
), before acting upon it (like fetching, adding or removing an element), whereas the condition might change in-between.
In contrast, a thread working on an immutable data structure may work on an outdated version of it, but all variables belonging to that structure are consistent to each other, reflecting the same version. When performing an update, you don’t need to think about exclusion of other threads, it’s simply not necessary, as the new data structures are not seen by other threads. The entire task of publishing a new version reduces to the task of publishing the root reference to the new version of your data structure. If you can’t stop the other threads processing the old version, the worst thing that can happen, is that you may have to repeat the operation using the new data afterwards, in other words, just a performance issue, in the worst case.
This works smooth with programming languages with garbage collection, as these allow it to let the new data structure refer to old objects, just replacing the changed objects (and their parents), without needing to worry about which objects are still in use and which not.