111

Are there any guidelines in Scala on when to use val with a mutable collection versus using var with an immutable collection? Or should you really aim for val with an immutable collection?

The fact that there are both types of collection gives me a lot of choice, and often I don't know how to make that choice.

James McCabe
  • 1,879
  • 2
  • 15
  • 22

5 Answers5

107

Pretty common question, this one. The hard thing is finding the duplicates.

You should strive for referential transparency. What that means is that, if I have an expression "e", I could make a val x = e, and replace e with x. This is the property that mutability break. Whenever you need to make a design decision, maximize for referential transparency.

As a practical matter, a method-local var is the safest var that exists, since it doesn't escape the method. If the method is short, even better. If it isn't, try to reduce it by extracting other methods.

On the other hand, a mutable collection has the potential to escape, even if it doesn't. When changing code, you might then want to pass it to other methods, or return it. That's the kind of thing that breaks referential transparency.

On an object (a field), pretty much the same thing happens, but with more dire consequences. Either way the object will have state and, therefore, break referential transparency. But having a mutable collection means even the object itself might lose control of who's changing it.

Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • 39
    Nice, new big picture in my mind: Prefer `immutable val` over `immutable var` over `mutable val` over `mutable var`. Especially `immutable var` over `mutable val`! – Peter Schmitz Jul 09 '12 at 06:34
  • 2
    Keep in mind that you could still close (as in leak a side-effecting "function" that can change it) over a local mutable `var`. Another nice feature of using immutable collections is that you can efficiently keep old copies around, even if the `var` mutates. – Mysterious Dan Jan 14 '14 at 23:52
  • 1
    tl;dr: prefer `var x: Set[Int]` **over** `val x: mutable.Set[Int]` since if you pass `x` to some other function, in the former case, you are sure, that function cannot mutate `x` for you. – pathikrit Oct 03 '15 at 18:30
18

If you work with immutable collections and you need to "modify" them, for example, add elements to them in a loop, then you have to use vars because you need to store the resulting collection somewhere. If you only read from immutable collections, then use vals.

In general, make sure that you don't confuse references and objects. vals are immutable references (constant pointers in C). That is, when you use val x = new MutableFoo(), you'll be able to change the object that x points to, but you won't be able to change to which object x points. The opposite holds if you use var x = new ImmutableFoo(). Picking up my initial advice: if you don't need to change to which object a reference points, use vals.

Malte Schwerhoff
  • 12,684
  • 4
  • 41
  • 71
  • 1
    `var immutable = something(); immutable = immutable.update(x)` defeats the purpose of using an immutable collection. You've already given up referential transparency and you can usually get the same effect from a mutable collection with better time complexity. Of the four possibilities (`val` and `var`, mutable and immutable), this one makes the least sense. I do often use `val mutable`. – Jim Pivarski Sep 11 '13 at 01:14
  • 3
    @JimPivarski I disagree, as do others, see Daniel's answer and the comment by Peter. If you need to update a data structure, then using an immutable var instead of a mutable val has the advantage that you can leak references to the structure w/o risking that it is modified by others in a way that breaks your local assumptions. The disadvantage for these "others" is, that they might read stale data. – Malte Schwerhoff Dec 02 '13 at 13:10
  • I've changed my mind and I agree with you (I'm leaving my original comment for history). I've since used this, especially in `var list: List[X] = Nil; list = item :: list; ...` and I'd forgotten that I once wrote differently. – Jim Pivarski Dec 02 '13 at 18:11
  • @MalteSchwerhoff: "stale data" is actually desirable, depending on how you've designed your program, if consistency is crucial; this is for example one of the main underlying principles in how concurrency works in Clojure. – Erik Kaplun Mar 11 '14 at 17:00
  • @ErikAllik I would not say that stale data is desirable per se, but I agree in that it can be perfectly fine, depending on the guarantees you want/need to give to your clients. Or do you have an example where the sole fact of reading stale data is actually an advantage? I don't mean consequences of accepting stale data, which could be better performance or a simpler API. – Malte Schwerhoff Mar 19 '14 at 07:23
  • "stale data" is better in a highly concurrent application with a database with temporal aspects: imagine you have an entity A with which you want make some computations, and associate the result with it; if something now changes the globally visible state of A, the association will be inconsistent; whereas if the compuation always sees and works with the "stale" snapshot of A everything will be consistent and the next snapshot of A will have it's own computation results. Basically you're just ignoring the passage of time for short periods for the sake of consistency. – Erik Kaplun Mar 19 '14 at 16:12
  • @MalteSchwerhoff: or just skip what I wrote and read about the underlying concepts and maths behind for example [Clojure](https://en.wikipedia.org/wiki/Clojure). – Erik Kaplun Mar 19 '14 at 16:14
10

The best way to answer this is with an example. Suppose we have some process simply collecting numbers for some reason. We wish to log these numbers, and will send the collection to another process to do this.

Of course, we are still collecting numbers after we send the collection to the logger. And let's say there is some overhead in the logging process that delays the actual logging. Hopefully you can see where this is going.

If we store this collection in a mutable val, (mutable because we are continuously adding to it), this means that the process doing the logging will be looking at the same object that's still being updated by our collection process. That collection may be updated at any time, and so when it's time to log we may not actually be logging the collection we sent.

If we use an immutable var, we send an immutable data structure to the logger. When we add more numbers to our collection, we will be replacing our var with a new immutable data structure. This doesn't mean collection sent to the logger is replaced! It's still referencing the collection it was sent. So our logger will indeed log the collection it received.

jmazin
  • 1,021
  • 9
  • 11
2

I think the examples in this blog post will shed more light, as the question of which combo to use becomes even more important in concurrency scenarios: importance of immutability for concurrency. And while we're at it, note the preferred use of synchronised vs @volatile vs something like AtomicReference: three tools

-2

var immutable vs. val mutable

In addition to many excellent answers to this question. Here is a simple example, that illustrates potential dangers of val mutable:

Mutable objects can be modified inside methods, that take them as parameters, while reassignment is not allowed.

import scala.collection.mutable.ArrayBuffer

object MyObject {
    def main(args: Array[String]) {

        val a = ArrayBuffer(1,2,3,4)
        silly(a)
        println(a) // a has been modified here
    }

    def silly(a: ArrayBuffer[Int]): Unit = {
        a += 10
        println(s"length: ${a.length}")
    }
}

Result:

length: 5
ArrayBuffer(1, 2, 3, 4, 10)

Something like this cannot happen with var immutable, because reassignment is not allowed:

object MyObject {
    def main(args: Array[String]) {
        var v = Vector(1,2,3,4)
        silly(v)
        println(v)
    }

    def silly(v: Vector[Int]): Unit = {
        v = v :+ 10 // This line is not valid
        println(s"length of v: ${v.length}")
    }
}

Results in:

error: reassignment to val

Since function parameters are treated as val this reassignment is not allowed.

Akavall
  • 82,592
  • 51
  • 207
  • 251
  • This is incorrect. The reason you got that error is because you used Vector in your second example which, by default is immutable. If you use an ArrayBuffer you'll see it compiles fine and does the same thing where it just adds in the new element and prints out the mutated buffer. https://pastebin.com/vfq7ytaD – EdgeCaseBerg Sep 27 '17 at 17:25
  • @EdgeCaseBerg, I am intentionally using a Vector in my second example, because I am trying to show that the behavior of the first example `mutable val` is not possible with the `immutable var`. What here is incorrect? – Akavall Sep 27 '17 at 17:38
  • You're comparing apples to oranges here. Vector has no `+=` method like array buffer. Your answers implies that `+=` is the same as `x = x + y` which it is not. Your statement that function params are treated as vals is correct and you do get the error you mention but only because you used `=`. You can get the same error with an ArrayBuffer so the collections mutability here isn't really relevant. So its not a good answer because its not getting at what the OP is talking about. Though it is a good example of the dangers of passing a mutable collection around if you didn't intend to. – EdgeCaseBerg Sep 27 '17 at 18:40
  • @EdgeCaseBerg But you can't replicate the behavior I get with `ArrayBuffer`, by using `Vector`. The OP's question is broad, but they were looking for suggestions on when to use which, so I believe my answer is useful because it illustrates dangers of passing around mutable collection (the fact that is `val` does not help); `immutable var` is safer than `mutable val`. – Akavall Sep 28 '17 at 21:59