Do we need to lock the immutable list in kotlin?

Question

    var list = listOf("one", "two", "three")
    
    fun One() {
      list.forEach { result ->
      /// Does something here
      } 
    }
    
    fun Two() {
      list = listOf("four", "five", "six")
    }

Can function One() and Two() run simultaneously? Do they need to be protected by locks?

Jens Baitinger · Answer 1 · 2021-03-24T17:46:27.560

0

No, you dont need to lock the variable. Even if the function One() still runs while you change the variable, the forEach function is running for the first list. What could happen is that the assignment in Two() happens before the forEach function is called, but the forEach would either loop over one or the other list and not switch due to the assignment

if you had a println(result) in your forEach, your program would output either

one
two
three

or

four
five
six

dependent on if the assignment happens first or the forEach method is started.

what will NOT happen is something like

one
two
five
six

edited Mar 24 '21 at 17:46

answered Mar 24 '21 at 17:17

Jens Baitinger

2,230
14
34

I don't think it's as simple as that. The `list` _reference_ may be atomic, but that doesn't guarantee visibility of the object it's referring to. So if both functions run simultaneously on different threads, it's not impossible that the thread running `One()` could get the updated value of `list` but not yet be able to see the contents of the new list, and behave strangely. – gidds Mar 24 '21 at 17:36
Thank you. I think it is okay if the One() doesn't see the updated the values from Two() for an invocation; – Aadhavan S Mar 24 '21 at 17:55
in general its not the best idea having a global variable being changed by 2 threads. – Jens Baitinger Mar 24 '21 at 18:05
@JensBaitinger Your answer's clearer, but I'm afraid I still think it's wrong. `Two` could flush the `list` reference out to main memory while the new list object is still in local cache; then when `One` tries to iterate through that object it would follow the reference and see whatever garbage is in the main memory locations that the list object _will_ be written to. That could cause errors or corrupt data. (This'd be rare, of course, and system-dependent; many threading errors only strike when the system's heavily loaded.) – gidds Mar 24 '21 at 18:28
Using a lock or other synchronisation would ensure a ‘happens-before’ relationship that would force it to be flushed to main memory first, avoiding the problem. — See https://en.wikipedia.org/wiki/Java_memory_model for details. – gidds Mar 24 '21 at 18:28
@gidd, your assumptions are wrong. please note that `list.forEach` is a function accepting a lamda, meaning you call that function on an object (that might be the first or the second list) but it will always be one of these objects) – Jens Baitinger Mar 24 '21 at 18:38
What I think gidds is saying is that the first list instance could have been made available to the GC by what the Two function did in another thread, so the first list instance content could become corrupt while One is iterating it. Even though One is grabbing a reference to it, it may be grabbing the local cache value after the instance is already subject to release to the GC due to the two functions running near-simulatneously. I'm not familiar enough with this subject to be able to say whether or not this is possible. I would expect `@Volatile` to be sufficient to prevent it. – Tenfour04 Mar 24 '21 at 20:26
(Please see my answer.) – gidds Mar 25 '21 at 22:47

score 0 · Answer 2 · answered Mar 25 '21 at 22:46

Can function One() and Two() run simultaneously?

There are two ways that that could happen:

One of those functions could call the other. This could happen directly (where the code represented by // Does something here in One()⁽¹⁾ explicitly calls Two()), or indirectly (it could call something else which ends up calling Two() — or maybe the list property has a custom setter which does something that calls One()).
One thread could be running One() while a different thread is running Two(). This could happen if your program launches a new thread directly, or a library or framework could do so. For example, GUI frameworks tend to have one thread for dispatching events, and others for doing work that could take time; and web server frameworks tend to use different threads for servicing different requests.

If neither of those could apply, then there would be no opportunity for the functions to run simultaneously.

Do they need to be protected by locks?

If there's any possibility of them being run on multiple threads, then yes, they need to be protected somehow.

99.999% of the time, the code would do exactly what you'd expect; you'd either see the old list or the new one. However, there's a tiny but non-zero chance that it would behave strangely — anything from giving slightly wrong results to crashing. (The risk depends on things like the OS, CPU/cache topology, and how heavily loaded the system is.)

Explaining exactly why is hard, though, because at a low level the Java Virtual Machine⁽²⁾ does an awful lot of stuff that you don't see. In particular, to improve performance it can re-order operations within certain limits, as long as the end result is the same — as seen from that thread. Things may look very different from other threads — which can make it really hard to reason about multi-threaded code!

Let me try to describe one possible scenario…

Suppose Thread A is running One() on one CPU core, and Thread B is running Two() on another core, and that each core has its own cache memory.⁽³⁾

Thread B will create a List instance (holding references to strings from the constant pool), and assign it to the list property; both the object and the property are likely to be written to its cache first. Those cache lines will then get flushed back to main memory — but there's no guarantee about when, nor about the order in which that happens. Suppose the list reference gets flushed first; at that point, main memory will have the new list reference pointing to a fresh area of memory where the new object will go — but since the new object itself hasn't been flushed yet, who knows what's there now?

So if Thread A starts running One() at that precise moment, it will get the new list reference⁽⁴⁾, but when it tries to iterate through the list, it won't see the new strings. It might see the initial (empty) state of the list object before it was constructed, or part-way through construction⁽⁵⁾. (I don't know whether it's possible for it to see any of the values that were in those memory locations before the list was created; if so, those might represent an entirely different type of object, or even not a valid object at all, which would be likely to cause an exception or error of some kind.)

In any case, if multiple threads are involved, it's possible for one to see list holding neither the original list nor the new one.

So, if you want your code to be robust and not fail occasionally⁽⁶⁾, then you have to protect against such concurrency issues.

Using @Synchronized and @Volatile is traditional, as is using explicit locks. (In this particular case, I think that making list volatile would fix the problem.)

But those low-level constructs are fiddly and hard to use well; luckily, in many situations there are better options. The example in this question has been simplified too much to judge what might work well (that's the down-side of minimal examples!), but work queues, actors, executors, latches, semaphores, and of course Kotlin's coroutines are all useful abstractions for handling concurrency more safely.

Ultimately, concurrency is a hard topic, with a lot of gotchas and things that don't behave as you'd expect.

There are many source of further information, such as:

These other questions cover some of the issues.
Chapter 17: Threads And Locks from the Java Language Specification is the ultimate reference on how the JVM behaves. In particular, it describes what's needed to ensure a happens-before relationship that will ensure full visibility.
Oracle has a tutorial on concurrency in Java; much of this applies to Kotlin too.
The java.util.concurrent package has many useful classes, and its summary discusses some of these issues.
Concurrent Programming In Java: Design Principles And Patterns by Doug Lea was at one time the best guide to handling concurrency, and these excerpts discuss the Java memory model.
Wikipedia also covers the Java memory model

^{(1) According to Kotlin coding conventions, function names should start with a lower-case letter; that makes them easier to distinguish from class/object names.}

^{(2) In this answer I'm assuming Kotlin/JVM. Similar risks are likely apply to other platforms too, though the details differ.}

^{(3) This is of course a simplification; there may be multiple levels of caching, some of which may be shared between cores/processors; and some systems have hardware which tries to ensure that the caches are consistent…}

^{(4) References themselves are atomic, so a thread will either see the old reference or the new one — it can't see a bit-pattern comprising parts of the old and new ones, pointing somewhere completely random. So that's one problem we don't have!}

^{(5) Although the reference is immutable, the object gets mutated during construction, so it might be in an inconsistent state.}

^{(6) And the more heavily loaded your system is, the more likely it is for concurrency issues to occur, which means that things will probably fail at the worst possible time!}

Do we need to lock the immutable list in kotlin?

2 Answers2