A grasp of immutable datastructures

Question

I am learning scala and as a good student I try to obey all rules I found.

One rule is: IMMUTABILITY!!!

So I have tried to code everything with immutable data structures and vals, and sometimes this is really hard.

But today I thought to myself: the only important thing is that the object/class should have no mutable state. I am not forced to code all methods in an immutable style, because these methods don't affect each other.

My Question: Am I correct or are there any problems/disadvantages I dont see?

EDIT:

Code example for aishwarya:

def logLikelihood(seq: Iterator[T]): Double = {
  val sequence = seq.toList
  val stateSequence = (0 to order).toList.padTo(sequence.length,order)
  val seqPos = sequence.zipWithIndex

  def probOfSymbAtPos(symb: T, pos: Int) : Double = {
    val state = states(stateSequence(pos))
    M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))
  }

  val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )

  probs.sum
}

Explanation: It is a method to calculate the log-likelihood of a homogeneous Markov model of variable order. The apply method of state takes all previous symbols and the coming symbol and returns the probability of doing so.

As you may see: the whole method is just multiplying some probabilities which would be much easier using vars.

The chosen answer is wrong. See my comments under the answer. Referentially transparency is not even commutative. — Shelby Moore III, Feb 15 '12 at 23:53
Let me just remark that last comment from Shelby Moore III is under heavy discussion. — Blaisorblade, Feb 17 '12 at 15:45

Apocalisp · Accepted Answer · 2011-12-05T13:06:16.550

50

The rule is not really immutability, but referential transparency. It's perfectly OK to use locally declared mutable variables and arrays, because none of the effects are observable to any other parts of the overall program.

The principle of referential transparency (RT) is this:

An expression e is referentially transparent if for all programs p every occurrence of e in p can be replaced with the result of evaluating e, without affecting the observable result of p.

Note that if e creates and mutates some local state, it doesn't violate RT since nobody can observe this happening.

That said, I very much doubt that your implementation is any more straightforward with vars.

edited Dec 05 '11 at 13:06

answered Dec 01 '11 at 18:59

Apocalisp

34,834
8
106
155

3

Yes. What you're looking for is some principle that, when followed, gives you some benefit. You had "immutability", which is pretty close, but too restrictive. "Referential transparency" is the more general principle, and I believe it's the principle you're looking for. – Apocalisp Dec 01 '11 at 19:04
2

Also note that the principle you're looking for is not *moderation*. I.e. the idea that "a little" mutable state is OK. – Apocalisp Dec 01 '11 at 19:11
2

@Apocalisp I agree with the principle of what you're saying, but I don't feel it's entirely accurate. Using internal mutations still means that you're violating referential transparency *within the local implementation*. That can be a problem for all the same reasons if the local scope is large and complex enough (but it usually shouldn't be anyway). It's just that you get most of the benefits by programming with referentially transparent **interfaces**. You seem to be implying that referential transparency is a property that can only be applied to interfaces, which isn't true. – Ben Dec 02 '11 at 00:33
@Ben: I get what you're saying, but RT is really a property of each expression individually. An expression can be RT even if some of its subexpressions aren't. There's even a way to guarantee in the types that a mutation is confined to a specific subexpression. – Apocalisp Dec 02 '11 at 03:16
2

@Apocalisp Ah, then we agree. I read your post as suggesting that if you went for "referential transparency everywhere" rather than "immutability everywhere", that you would naturally end up with pure interfaces and possibly impure local implementation. Although I believe it is possible for a function that is referentially transparent at its interface but uses mutability internally to fail when you have concurrency. If that's correct, then not all ways of using "local mutability" are truly unobservable. – Ben Dec 02 '11 at 04:36
3

@Ben: If concurrency can affect the result, then the expression is most emphatically *not* referentially transparent. – Apocalisp Dec 02 '11 at 13:49
@Apocalisp: Yes, that was my point (perhaps poorly phrased). It is possible to use mutability hidden behind an interface in such a way that the interface appears referentially transparent if you only consider all possible single threaded programs, but is not when concurrency is involved. So there really is a difference to thinking about referential transparency only at the interface, but you're right in that there is not much benefit to be gained by going "all the way" (in a language like Scala, which doesn't enforce purity, at least). – Ben Dec 02 '11 at 22:29
@Ben: I never said that there is not much benefit in "going all the way" with RT. In fact there's a huge benefit. I don't seen any difference between expressions that are RT with respect to a single thread vs multiple threads. An expression is either RT or not, period. – Apocalisp Dec 03 '11 at 21:57
@Apocalisp: By "going all the way with RT" I meant making **every** expression in your program referentially transparent, i.e. implemented with no side effects, or "pure". Your answer advises defining functions that have pure **interfaces**, but may not have pure implementations. We're saying the same thing. I only commented because your answer reads as if aiming for referential transparency **means** using pure interfaces that may or may not have pure implementations. It does not. As you say, RT is a property of each expression individually, **including** ones in the implementations. – Ben Dec 05 '11 at 02:04
@Ben: We agree on everything except "not all ways of using 'local mutability' are truly unobservable". If an expression closes over a mutable variable and then proceeds to mutate it, then that is not RT, even if you're careful to mutate it back to its original state. In other words, such a mutation is **not local**. So it remains true that even if an expression introduces some local mutable variables and then mutates them, it can remain RT. But if you mutate variables that are defined outside the expression, it can't. So while you may not *feel* that what I said was "entirely accurate", it is. – Apocalisp Dec 05 '11 at 12:42
1

@Apocalisp: Fair enough then. Had I written this answer I would have made it clear that creating and mutating local state *is* violating referential transparency locally, just in a way that allows you to keep most of the advantages to programming in a referentially transparent way. I still feel that readers unfamiliar with the concept might come away from this answer with misleading impressions of it. But this isn't supposed to be a textbook, so that probably doesn't matter. Thanks for discussing it. – Ben Dec 05 '11 at 22:15
@Apocalisp local orders in a pure function (PF) can be externally observable. For example, if a PF returns a list of ordered and/or duplicate actions which a non-PF caller writes to persistent state (in order), some of the expressions within the PF aren't [commutative and idempotent](http://awelonblue.wordpress.com/2012/01/12/defining-declarative/). Irrelevant if some of the sub-expressions in the PF may be referentially transparent (RT). Assignment to local mutable state is a non-RT expression, i.e. if all expressions inside the PF are RT, there isn't any local mutable state (nor ordering). – Shelby Moore III Feb 15 '12 at 22:31
1

I will call your attention to the fact that Shelby Moore III talks a lot of nonsense. – Apocalisp Feb 15 '12 at 23:01
@Apocalisp A terse definition is that expressions are referentially transparent (RT) when every duplicate can be replaced with the result of evaluating the expression once (and the program's behavior is unaffected by this substitution). RT expressions are idempotent but not commutative because the definition of RT says nothing about the relative ordering of non-duplicate expressions. – Shelby Moore III Feb 15 '12 at 23:23
@Apocalisp I just saw your comment alleging that I am speaking nonsense. Check your own definition of RT above and observe it says nothing about relative ordering, i.e. the commutative property. RT expressions within a pure function cannot prevent ordering in the return value, e.g. a returned collection. The non-pure function caller is free to write that order into persistent state, causing those RT expressions in the pure function to be non-commutative. – Shelby Moore III Feb 15 '12 at 23:42
It says nothing about relative ordering because it has nothing to say about it. And I wasn't specifically commenting on this nonsense, but on the seer volume of nonsense that you emit. – Apocalisp Feb 17 '12 at 00:08
@Apocalisp Since my prior comment, I come to realize that RT does not provide idempotence, because RT doesn't say we can eliminate duplicate expressions. RT only says we can replace duplicates by their evaluated value. – Shelby Moore III Feb 17 '12 at 14:59
@Apocalisp Thanks for agreeing that RT does not provide commutativity. Commutativity and idempotence are essential for semantically stable composition (i.e. behavior is not changed by ordering or duplication due to nested composition of functions). Immutable data structures don't provide commutativity nor idempotence either. Generally speaking only stateless declarations do. There is no panacea from a programming rule. Pure functions, which work best with immutable data structures, localize reasoning about state to the inputs and outputs, but they still can leak imperative ordering. – Shelby Moore III Feb 17 '12 at 15:13
@ShelbyMooreIII: I'm puzzled by your comments, here's how I understand them. Consider the expression `E` defined as `(computeA(), computeB())` where `computeA` and `computeB` are pure. Now `E` is equivalent to `{val b = computeB(); val a = computeA(); (a, b)}`: this is the "commutativity" provided by RT (i.e. evaluation order for pure subexpressions does not matter). You discuss replacing `(a, b)` with `(b, a)`, which of course produces a different result, but this kind of "commutativity" seems irrelevant to the discussion, and you seem to confuse the two properties. Could you clarify? Thanks! – Blaisorblade Feb 17 '12 at 15:42
@Blaisorblade: Thanks for raising the issue. RT expressions are not commutative w.r.t. to the composition of their values. RT expressions are commutative w.r.t. to their evaluation order, but RT expressions are useless (i.e. they make no declaration) if their values are not composed. Also, Evaluation order doesn't directly impact program behavior, although it does indirectly, e.g. CPU cycles, memory. My [***DETAILED EXPLANATION***](http://copute.com/docs/Event.html#Theory) (note: updated link) covers this and the original question on this page. – Shelby Moore III Feb 21 '12 at 03:41
2

@ShelbyMooreIII: the fact that tuples are ordered (while, for instance, `Set`s are not) has nothing to do with RT. Your updated explanation is still too vague - you do not even define commutative (not correctly). I think that RT also implies the only (sensible) idempotent definition I see (evaluating an expression once or twice doesn't affect the semantics). Moreover, your explanation is written too abstractly: it's bad writing http://amzn.to/zF3K6v, you don't even quote your definition of 'declarative' from awelonblue, and you were clearly confused. In the end, I agree with @Apocalisp. – Blaisorblade Feb 21 '12 at 09:05
2

@ShelbyMooreIII: Note: The original post (http://awelonblue.wordpress.com/2012/01/12/defining-declarative/) is somewhat clearer - but it also makes clear that the author is talking about declarations and that even pure functional programming is not declarative and does not have the properties of commutativity and idempotence. I buy that for declarations and for the examples he makes, but those concepts can't make sense for functional programming itself. The mistake is once again yours. – Blaisorblade Feb 21 '12 at 09:13
@Blaisorblade: you said, " that tuples are ordered has nothing to do with RT". [Commutative property](http://en.wikipedia.org/wiki/Commutative_property) requires that swapping the order doesn't change the result. From the RT definition, ***RT expressions do nothing if we don't store their result***, e.g. in a tuple. If we swap RT expressions where we store their results, then clearly the result isn't always commutative. An [idempotent](http://en.wikipedia.org/wiki/Idempotence) expression can be repeated w/o changing the result, thus idempotence requires that duplicates can be deleted. – Shelby Moore III Feb 21 '12 at 20:09
@Blaisorblade: my explanation is quite precise-- declarative is "actual semantics == intended semantics", i.e all loss of commutativity and idempotence is in the intended semantics. Some concepts take a while for the "Ah ha" understanding. It seems you are conflating properties and topics. Purity is orthogonal to the declarative property. Commutative and idempotence are orthogonal to the declarative property. Good luck and thanks for raising the issue, so I could clarify. (fwiw, at age 46, note I do have roughly 20 years more experience than you apparently) – Shelby Moore III Feb 21 '12 at 20:20

Neil Essy · Answer 2 · 2011-12-01T19:04:26.597

The case for functional programming is one of being concise in your code and bringing in a more mathematical approach. It can reduce the possibility of bugs and make your code smaller and more readable. As for being easier or not, it does require that you think about your problems differently. But once you get use to thinking with functional patterns it's likely that functional will become easier that the more imperative style.

It is really hard to be perfectly functional and have zero mutable state but very beneficial to have minimal mutable state. The thing to remember is that everything needs to done in balance and not to the extreme. By reducing the amount of mutable state you end up making it harder to write code with unintended consequences. A common pattern is to have a mutable variable whose value is immutable. This way identity ( the named variable ) and value ( an immutable object the variable can be assigned ) are seperate.

var acc: List[Int] = Nil
// lots of complex stuff that adds values
acc ::= 1
acc ::= 2
acc ::= 3
// do loop current list
acc foreach { i => /* do stuff that mutates acc */ acc ::= i * 10 }
println( acc ) // List( 1, 2, 3, 10, 20, 30 )

The foreach is looping over the value of acc at the time we started the foreach. Any mutations to acc do not affect the loop. This is much safer than the typical iterators in java where the list can change mid iteration.

There is also a concurrency concern. Immutable objects are useful because of the JSR-133 memory model specification which asserts that the initialization of an objects final members will occur before any thread can have visibility to those members, period! If they are not final then they are "mutable" and there is no guarantee of proper initialization.

Actors are the perfect place to put mutable state. Objects that represent data should be immutable. Take the following example.

object MyActor extends Actor {
  var acc: List[Int] = Nil
  def act() {
    loop {
      react {
        case i: Int => acc ::= i
        case "what is your current value" => reply( acc )
        case _ => // ignore all other messages
      }
    }
  }
}

In this case we can send the value of acc ( which is a List ) and not worry about synchronization because List is immutable aka all of the members of the List object are final. Also because of the immutability we know that no other actor can change the underlying data structure that was sent and thus no other actor can change the mutable state of this actor.

score 3 · Answer 3 · edited May 23 '17 at 12:26

Since Apocalisp has already mentioned the stuff I was going to quote him on, I'll discuss the code. You say it is just multiplying stuff, but I don't see that -- it makes reference to at least three important methods defined outside: order, states and M.log. I can infer that order is an Int, and that states return a function that takes a List[T] and a T and returns Double.

There's also some weird stuff going on...

def logLikelihood(seq: Iterator[T]): Double = {
  val sequence = seq.toList

sequence is never used except to define seqPos, so why do that?

  val stateSequence = (0 to order).toList.padTo(sequence.length,order)
  val seqPos = sequence.zipWithIndex

  def probOfSymbAtPos(symb: T, pos: Int) : Double = {
    val state = states(stateSequence(pos))
    M.log(state( seqPos.map( _._1 ).slice(0, pos).takeRight(order), symb))

Actually, you could use sequence here instead of seqPos.map( _._1 ), since all that does is undo the zipWithIndex. Also, slice(0, pos) is just take(pos).

  }

  val probs = seqPos.map( i => probOfSymbAtPos(i._1,i._2) )

  probs.sum
}

Now, given the missing methods, it is difficult to assert how this should really be written in functional style. Keeping the mystery methods would yield:

def logLikelihood(seq: Iterator[T]): Double = {
  import scala.collection.immutable.Queue
  case class State(index: Int, order: Int, slice: Queue[T], result: Double)

  seq.foldLeft(State(0, 0, Queue.empty, 0.0)) {
    case (State(index, ord, slice, result), symb) =>
      val state = states(order)
      val partial = M.log(state(slice, symb))
      val newSlice = slice enqueue symb
      State(index + 1, 
            if (ord == order) ord else ord + 1, 
            if (queue.size > order) newSlice.dequeue._2 else newSlice,
            result + partial)
  }.result
}

Only I suspect the state/M.log stuff could be made part of State as well. I notice other optimizations now that I have written it like this. The sliding window you are using reminds me, of course, of sliding:

seq.sliding(order).zipWithIndex.map { 
  case (slice, index) => M.log(states(index + order)(slice.init, slice.last))
}.sum

That will only start at the orderth element, so some adaptation would be in order. Not too difficult, though. So let's rewrite it again:

def logLikelihood(seq: Iterator[T]): Double = {
  val sequence = seq.toList
  val slices = (1 until order).map(sequence take) ::: sequence.sliding(order)
  slices.zipWithIndex.map { 
    case (slice, index) => M.log(states(index)(slice.init, slice.last))
  }.sum
}

I wish I could see M.log and states... I bet I could turn that map into a foldLeft and do away with these two methods. And I suspect the method returned by states could take the whole slice instead of two parameters.

Still... not bad, is it?

A grasp of immutable datastructures

3 Answers3

Linked