Functional programming - is immutability expensive?

Question

The question is in two parts. The first is conceptual. The next looks at the same question more concretely in Scala.

Does using only immutable data structures in a programming language make implementing certain algorithms/logic inherently more computationally expensive in practice? This draws to the fact that immutability is a core tenet of purely functional languages. Are there other factors that impact this?
Let's take a more concrete example. Quicksort is generally taught and implemented using mutable operations on an in-memory data structure. How does one implement such a thing in a PURE functional way with a comparable computational and storage overhead to the mutable version. Specifically in Scala. I have included some crude benchmarks below.

More Details:

I come from an imperative programming background (C++, Java). I have been exploring functional programming, specifically Scala.

Some of the primary principles of pure functional programming:

Functions are first class citizens.
Functions do not have side effects and hence objects/data structures are immutable.

Even though modern JVMs are extremely efficient with object creation and garbage collection is very inexpensive for short lived objects, it's probably still better to minimize object creation right? At least in a single-threaded application where concurrency and locking is not an issue. Since Scala is a hybrid paradigm, one can choose to write imperative code with mutable objects if necessary. But, as someone who has spent a lot of years trying to reuse objects and minimize allocation. I would like a good understanding of the school of thought that would not even allow that.

As a specific case, I was a little surprised by this code snippet in this tutorial 6 . It has a Java version of Quicksort followed by a neat looking Scala implementation of the same.

Here is my attempt to benchmark the implementations. I haven't done detailed profiling. But, my guess is that the Scala version is slower because the number of objects allocated is linear (one per recursion call). Is there any way chance that tail call optimizations can come into play? If I am right, Scala supports tail call optimizations for self-recursive calls. So, it should only be helping it. I am using Scala 2.8.

Java version

public class QuickSortJ {

    public static void sort(int[] xs) {
      sort(xs, 0, xs.length -1 );
    }

    static void sort(int[] xs, int l, int r) {
      if (r >= l) return;
      int pivot = xs[l];
      int a = l; int b = r;
      while (a <= b){
        while (xs[a] <= pivot) a++;
        while (xs[b] > pivot) b--;
        if (a < b) swap(xs, a, b);
      }
      sort(xs, l, b);
      sort(xs, a, r);
    }

    static void swap(int[] arr, int i, int j) {
      int t = arr[i]; arr[i] = arr[j]; arr[j] = t;
    }
}

Scala version

object QuickSortS {

  def sort(xs: Array[Int]): Array[Int] =
    if (xs.length <= 1) xs
    else {
      val pivot = xs(xs.length / 2)
      Array.concat(
        sort(xs filter (pivot >)),
        xs filter (pivot ==),
        sort(xs filter (pivot <)))
    }
}

Scala Code to compare implementations

import java.util.Date
import scala.testing.Benchmark

class BenchSort(sortfn: (Array[Int]) => Unit, name:String) extends Benchmark {

  val ints = new Array[Int](100000);

  override def prefix = name
  override def setUp = {
    val ran = new java.util.Random(5);
    for (i <- 0 to ints.length - 1)
      ints(i) = ran.nextInt();
  }
  override def run = sortfn(ints)
}

val benchImmut = new BenchSort( QuickSortS.sort , "Immutable/Functional/Scala" )
val benchMut   = new BenchSort( QuickSortJ.sort , "Mutable/Imperative/Java   " )

benchImmut.main( Array("5"))
benchMut.main( Array("5"))

Results

Time in milliseconds for five consecutive runs

Immutable/Functional/Scala    467    178    184    187    183
Mutable/Imperative/Java        51     14     12     12     12

It is expensive when implemented naively or with methods grown for imperative languages. A smart compiler (e.g. GHC, a Haskell compiler - and Haskell has *only* immutable values) *can* take advantage of immutability and achive performance that can rival code using mutability. Needless to say, the naive implementation of quicksort still is dog slow because it uses, among other costy things, heavy recursion and `O(n)` list concat. It's shorter than the pseudocode version though ;) — , Nov 04 '10 at 22:30
This question is two part. 1) Comparing the paradigms involving functional and imperative programming mutability/immutability 2) Implementation specifics of Java/Scala. Maybe it should have been two different questions. — smartnut007, Nov 04 '10 at 22:39
@delnan I see. In that case, it raises a good quetion. Is scala really worth it if it cant compare to natively implemented functional languages. But, the general word out there seems to be that Scala is fast. — smartnut007, Nov 04 '10 at 22:42
A great, related blog article is here: http://blogs.sun.com/jrose/entry/larval_objects_in_the_vm encourage him on, because this would benefit Java as well as functional VM languages greatly — andersoj, Nov 04 '10 at 22:44
this SO thread has a lot of detailed discussions regarding the efficiency of functional programming. http://stackoverflow.com/questions/1990464/efficiency-of-purely-functional-programming . Answers a great deal of what I wanted to know. — smartnut007, Nov 05 '10 at 04:28
Here is similar example scala-lang.org/docu/files/ScalaByExample.pdf — oluies, Nov 05 '10 at 07:43
The most naive thing here is your benchmark. You cannot benchmark anything with code like that! You should seriously read some articles on doing benchmark on the JVM before taking any conclusions... did you know that the JVM might not yet have JITted your code before you run it? Did you set your heap intial and max size appropriately (so that you won't consider the time the JVM processes asks for more memory?)? Are you aware of what methods are being compiled or recompiled? Are you aware of GC? The results you obtain from this code mean absolutely nothing! — Bruno Reis, Nov 06 '10 at 01:32
>> Ted Neward( seems like a prominent name in the Scala community ) << Why would you think that? Where can you find his name in the Scala community - http://www.scala-lang.org/node/1707 — igouy, Nov 08 '10 at 18:02
"consults, mentors, teaches, and presents" Where can you find Rex Kerr's name in the Scala community? — igouy, Nov 09 '10 at 17:15
@igouy: Rex Kerr? For example on the mailinglists pretty often. — user unknown, Aug 18 '11 at 14:59
Am I the only one, who is confused by this dichotomy of `functional` and `imperative` style? Isn't functional style, at least as done like in scala (but in Scheme, for example as well - I don't know very much functional languages, so please provide counterexamples) imperative too? `list.filter (x => x > 4).sort.take(10)` - what could be more imperative? — user unknown, Aug 18 '11 at 15:03
@userunknown No, that's declarative. Imperative programming "changes state with commands" whereas functional programming "is a declarative programming paradigm" that "avoids changing state" ([Wikipedia](https://en.wikipedia.org/wiki/Functional_programming)). So, yes, functional and imperative are two completely different things, and the code you wrote is *not* imperative. — Brian McCutchon, Sep 12 '16 at 19:12

Konrad Rudolph · Accepted Answer · 2010-11-05T13:49:51.687

Since there are a few misconceptions flying around here, I’d like to clarify some points.

The “in-place” quicksort isn’t really in-place (and quicksort is not by definition in-place). It requires additional storage in the form of stack space for the recursive step, which is in the order of O(log n) in the best case, but O(n) in the worst case.
Implementing a functional variant of quicksort that operates on arrays defeats the purpose. Arrays are never immutable.
The “proper” functional implementation of quicksort uses immutable lists. It is of course not in-place but it’s got the same worst-case asymptotic runtime (O(n^2)) and space complexity (O(n)) as the procedural in-place version.

On average, its running time is still on par with that of the in-place variant (O(n log n)). Its space complexity, however, is still O(n).

There are two obvious disadvantages of a functional quicksort implementation. In the following, let’s consider this reference implementation in Haskell (I don’t know Scala …) from the Haskell introduction:

qsort []     = []
qsort (x:xs) = qsort lesser ++ [x] ++ qsort greater
    where lesser  = (filter (< x) xs)
          greater = (filter (>= x) xs)

The first disadvantage is the choice of the pivot element, which is very inflexible. The strength of modern quicksort implementations relies heavily on a smart choice of the pivot (compare “Engineering a sort function” by Bentley et al.). The above algorithm is poor in that regard, which degrades average performance considerably.
Secondly, this algorithm uses list concatenation (instead of list construction) which is an O(n) operation. This doesn’t impact on the asymptotic complexity but it’s a measurable factor.

A third disadvantage is somewhat hidden: unlike the “in-place” variant, this implementation continually requests memory from the heap for the cons cells of the list and potentially scatters memory all over the place. As a result, this algorithm has a very poor cache locality. I don’t know whether smart allocators in modern functional programming languages can mitigate this – but on modern machines, cache misses have become a major performance killer.

What’s the conclusion? Unlike others, I wouldn’t say that quicksort is inherently imperative and that’s why it performs badly in an FP environment. Quite on the contrary, I would argue that quicksort is a perfect example of a functional algorithm: it translates seamlessly into an immutable environment, its asymptotic running time and space complexity are on par with the procedural implementation, and even its procedural implementation employs recursion.

But this algorithm still performs worse when constrained to an immutable domain. The reason for this is that the algorithm has the peculiar property of benefitting from a lot of (sometimes low-level) fine-tuning that can only be efficiently performed on arrays. A naive description of the quicksort misses all these intricacies (both in the functional and in the procedural variant).

After reading “Engineering a sort function” I can no longer consider quicksort an elegant algorithm. Implemented efficiently, it is a clunky mess, an engineer’s piece of work, not an artist’s (not to devalue engineering! this has its own aesthetic).

But I would also like to point out that this point is particular to quicksort. Not every algorithm is amenable to the same sort of low-level tweaking. A lot of algorithms and data structures really can be expressed without performance loss in an immutable environment.

And immutability can even decrease performance costs by removing the need of costly copies or cross-thread synchronizations.

So, to answer the original question, “is immutability expensive?” – In the particular case of quicksort, there is a cost that is indeed a result of immutability. But in general, no.

+1 - great answer! Although I would personally have ended with _sometimes_ rather than _no_. Still, that's just personality--you've explained the issues very well. — Rex Kerr, Nov 07 '10 at 16:05
You should add that a proper implementation using immutable values is immediately parallelisable as opposed to imperative versions. In modern technological context, this becomes more and more important. — Raphael, Mar 15 '11 at 18:18
How much would using `qsort lesser ++ (x : qsort greater)` help? — Solomon Ucko, Feb 21 '20 at 00:58

score 42 · Answer 2 · edited May 26 '16 at 23:18

There are a bunch of things wrong with this as a benchmark of functional programming. Highlights include:

You're using primitives, which may have to be boxed/unboxed. You're not trying to test the overhead of wrapping primitive objects, you're trying to test immutability.
You've chosen an algorithm where in-place operation is unusually effective (and provably so). If you want to show that there exist algorithms that are faster when implemented mutably, then this is a good choice; otherwise, this is likely to be misleading.
You are using the wrong timing function. Use System.nanoTime.
The benchmark is too short for you to be confident that JIT compilation won't be a significant part of the measured time.
The array isn't split in an efficient manner.
Arrays are mutable, so using them with FP is a strange comparison anyway.

So, this comparison is a great illustration that you must understand your language (and algorithm) in detail in order to write high-performance code. But it's not a very good comparison of FP vs. non-FP. If you want that, check out Haskell vs. C++ at the Computer Languages Benchmark Game. The take-home message there is that the penalty is typically not more than a factor of 2 or 3 or so, but it really depends. (No promises that the Haskell folks have written the fastest algorithms possible either, but at least some of them probably tried! Then again, some of the Haskell calls C libraries....)

Now, suppose you do want a more reasonable benchmark of Quicksort, recognizing that this is probably one of the worst cases for FP vs. mutable algorithms, and ignoring the data-structure issue (i.e. pretending that we can have an immutable Array):

object QSortExample {
  // Imperative mutable quicksort
  def swap(xs: Array[String])(a: Int, b: Int) {
    val t = xs(a); xs(a) = xs(b); xs(b) = t
  }
  def muQSort(xs: Array[String])(l: Int = 0, r: Int = xs.length-1) {
    val pivot = xs((l+r)/2)
    var a = l
    var b = r
    while (a <= b) {
      while (xs(a) < pivot) a += 1
      while (xs(b) > pivot) b -= 1
      if (a <= b) {
        swap(xs)(a,b)
        a += 1
        b -= 1
      }
    }
    if (l<b) muQSort(xs)(l, b)
    if (b<r) muQSort(xs)(a, r)
  }

  // Functional quicksort
  def fpSort(xs: Array[String]): Array[String] = {
    if (xs.length <= 1) xs
    else {
      val pivot = xs(xs.length/2)
      val (small,big) = xs.partition(_ < pivot)
      if (small.length == 0) {
        val (bigger,same) = big.partition(_ > pivot)
        same ++ fpSort(bigger)
      }
      else fpSort(small) ++ fpSort(big)
    }
  }

  // Utility function to repeat something n times
  def repeat[A](n: Int, f: => A): A = {
    if (n <= 1) f else { f; repeat(n-1,f) }
  }

  // This runs the benchmark
  def bench(n: Int, xs: Array[String], silent: Boolean = false) {
    // Utility to report how long something took
    def ptime[A](f: => A) = {
      val t0 = System.nanoTime
      val ans = f
      if (!silent) printf("elapsed: %.3f sec\n",(System.nanoTime-t0)*1e-9)
      ans
    }

    if (!silent) print("Scala builtin ")
    ptime { repeat(n, {
      val ys = xs.clone
      ys.sorted
    }) }
    if (!silent) print("Mutable ")
    ptime { repeat(n, {
      val ys = xs.clone
      muQSort(ys)()
      ys
    }) }
    if (!silent) print("Immutable ")
    ptime { repeat(n, {
      fpSort(xs)
    }) }
  }

  def main(args: Array[String]) {
    val letters = (1 to 500000).map(_ => scala.util.Random.nextPrintableChar)
    val unsorted = letters.grouped(5).map(_.mkString).toList.toArray

    repeat(3,bench(1,unsorted,silent=true))  // Warmup
    repeat(3,bench(10,unsorted))     // Actual benchmark
  }
}

Note the modification to the functional Quicksort so it only goes through the data once if at all possible, and the comparison to the built-in sort. When we run it we get something like:

Scala builtin elapsed: 0.349 sec
Mutable elapsed: 0.445 sec
Immutable elapsed: 1.373 sec
Scala builtin elapsed: 0.343 sec
Mutable elapsed: 0.441 sec
Immutable elapsed: 1.374 sec
Scala builtin elapsed: 0.343 sec
Mutable elapsed: 0.442 sec
Immutable elapsed: 1.383 sec

So, aside from learning that trying to write your own sort is a bad idea, we find that there is a ~3x penalty for an immutable quicksort if the latter is implemented somewhat carefully. (You could also write a trisect method that returns three arrays: those less than, those equal, and those greater than the pivot. This might speed things up slightly more.)

Just regarding the boxing/unboxing. If anything this should be a penalty on the java side right ? Isnt Int the preferred numeral type for Scala( vs Integer). So, there is no boxing happing on the scala side. Boxing is only an issue on the java side because autoboxing form scala Int to the java.lang.Integer/int. here is a link that talks about this subject in detail http://www.ansorg-it.com/en/scalanews-001.html — smartnut007, Nov 05 '10 at 04:49
Yes, I am playing the devils advocate here. Mutability is an intergral part of quicksorts design. Thats why I was very curious about the pure functional approach to the problem. Sigh, I have said this statement the 10th time on the thread :-). Will look at the rest of your post when I wake up and get back. thanks. — smartnut007, Nov 05 '10 at 04:52
@smartnut007 - Scala collections are generic. Generics require boxed types for the most part (though there is an effort underway to specialize them for certain primitive types). So you can't use all the nifty collections methods and assume that there will be no penalty when you pass collections of primitive types through them. It's quite likely that the primitive type will have to be boxed on the way in and unboxed on the way out. — Rex Kerr, Nov 05 '10 at 07:29
I don't like the fact that the top flaw you have stated is just a speculation :-) — smartnut007, Nov 06 '10 at 00:48
@smartnut007 - It's a top flaw because it's hard to check, and if true really screws up the results. If you are certain that there isn't boxing, then I agree that the flaw isn't valid. The flaw isn't that there _is_ boxing, it's that you _don't know_ whether there's boxing (and I'm not sure either--specialization has made this tricky to figure out). On the Java side (or the Scala mutable implementation) there is no boxing because you use only primitives. Anyway, an immutable version works through n log n space, so you really end up comparing the cost of compare/swap with memory allocation. — Rex Kerr, Nov 07 '10 at 16:00

score 10 · Answer 3 · edited May 23 '17 at 11:33

10

I don't think the Scala version is actually tail recursive, since you are using Array.concat.

Also, just because this is idiomatic Scala code, this doesn't mean it is the best way to do it.

The best way to do this would be to use one of Scala's built-in sorting functions. That way you get the immutability guarantee and know you have a speedy algorithm.

See Stack Overflow question How do I sort an array in Scala? for an example.

edited May 23 '17 at 11:33

Community

1
1

answered Nov 04 '10 at 22:39

TreyE

2,649
22
24

4

also, i do not think there is a tail recursive quick sort possible, as you have to make two recursive calls – Alex Lo Nov 04 '10 at 22:42
1

It is possible, you just have to use continuation closures to lift your would-be-stack-frames onto the heap. – Brian Nov 04 '10 at 22:52
1

inbuilt scala.util.Sorting.quickSort(array) mutates the array. It does as fast as java, not surprisingly. I am interested in an efficient pure functional solution. If not, why. Is it a limitation of Scala or functional paradigm in general. that sorta thing. – smartnut007 Nov 04 '10 at 23:32
@smartnut007: What version of Scala are you using? In Scala 2.8, you can do `array.sorted` which returns a new sorted array, doesn't mutate the original one. – missingfaktor Nov 05 '10 at 04:13
@AlexLo - there is a tail recursive quick sort possible. Something like: `TAIL-RECURSIVE-QUICKSORT(Array A, int lo, int hi): while p < r: q = PARTITION(A, lo, hi); TAIL-RECURSIVE-QUICKSORT(A, lo, q - 1); p = q + 1; ` – Jakeway Aug 17 '16 at 01:16

Daniel C. Sobral · Answer 4 · 2010-11-07T18:06:20.930

Immutability is not expensive. It sure can be expensive if you measure a small subset of the tasks a program have to do, and pick a solution based on mutability to boot -- such as measuring quicksort.

To put it simply, you don't quicksort when using purely functional languages.

Let's consider this from another angle. Let's consider these two functions:

// Version using mutable data structures
def tailFrom[T : ClassManifest](arr: Array[T], p: T => Boolean): Array[T] = {
  def posIndex(i: Int): Int = {
    if (i < arr.length) {
      if (p(arr(i)))
        i
      else
        posIndex(i + 1)
    } else {
      -1
    }
  }

  var index = posIndex(0)

  if (index < 0) Array.empty
  else {
    var result = new Array[T](arr.length - index)
    Array.copy(arr, index, result, 0, arr.length - index)
    result
  }
}

// Immutable data structure:

def tailFrom[T](list: List[T], p: T => Boolean): List[T] = {
  def recurse(sublist: List[T]): List[T] = {
    if (sublist.isEmpty) sublist
    else if (p(sublist.head)) sublist
    else recurse(sublist.tail)
  }
  recurse(list)
}

Benchmark THAT, and you'll find that the code using mutable data structures has much worse performance, because it needs to copy the array, while the immutable code need not concern itself with that.

When you program with immutable data structures, you structure your code to take advantage of its strengths. It is not simply the data type, or even individual algorithms. The program will be designed in a different manner.

Which is why benchmarking is usually meaningless. Either you choose algorithms that are natural to one style or another, and that style wins, or you benchmark the whole application, which is often impractical.

score 7 · Answer 5 · answered Nov 04 '10 at 23:09

Sorting an array is, like, the most imperative task in the universe. It is not surprising that many elegant 'immutable' strategies/implementations fail poorly on a 'sort an array' microbenchmark. This does not imply that immutability is expensive "in general", though. There are many tasks where immutable implementations will perform comparably to mutable ones, but array sorting often is not one of them.

score 7 · Answer 6 · answered Nov 04 '10 at 23:29

7

If you're simply rewriting your imperative algorithms and data structures into functional language it indeed will be expensive and useless. To make the things shine, you should use the features available only in functional programming: data stuctures persistence, lazy evaluations etc.

answered Nov 04 '10 at 23:29

Vasil Remeniuk

20,519
6
71
81

could you be kind enough to provide a implementation in Scala. – smartnut007 Nov 04 '10 at 23:38
3

http://www.powells.com/biblio/17-0521631246-0 (Purely Functional Data Structures by Chris Okasaki) - just look through this book. It has a strong story to tell on leveraging functional programming benefits, when implementing effective algorithms and data structures. – Vasil Remeniuk Nov 04 '10 at 23:43
1

http://code.google.com/p/pfds/ some data structures implemented in Scala by Debashish Ghosh – Vasil Remeniuk Nov 04 '10 at 23:46
Can you please explain why you think that Scala isn't imperative? `list.filter (foo).sort (bar).take (10)` - what could be more imperative? – user unknown Aug 18 '11 at 15:16

Kevin Wright · Answer 7 · 2010-11-08T10:20:45.077

7

QuickSort is known to be faster when done in-place, so this is hardly a fair comparison!

Having said that... Array.concat? If nothing else, you're showing how a collection type optimised for imperative programming is particularly slow when you try and use it in a functional algorithm; almost any other choice would be faster!

Another very important point to consider, perhaps the most important issue when comparing the two approaches is: "how well does this scale out to multiple nodes/cores?"

Chances are, if you're looking for an immutable quicksort then you're doing so because you actually want a parallel quicksort. Wikipedia has some citations on this subject: http://en.wikipedia.org/wiki/Quicksort#Parallelizations

The scala version can simply fork before the function recurses, allowing it to very quickly sort a list containing billions of entries if you have enough cores available.

Right now, the GPU in my system has 128 cores available to me if I could just run the Scala code on it, and this is on a simple desktop system two years behind the current generation.

How would that stack up against the single-threaded imperative approach I wonder...

Perhaps the more important question is therefore:

"Given that individual cores aren't going to get any faster, and synchronisation/locking presents a real challenge for parallelisation, is mutability expensive?"

edited Nov 08 '10 at 10:20

answered Nov 04 '10 at 23:29

Kevin Wright

49,540
9
105
155

1

No arguments there. Quicksort is by definition an in memory sort. I am sure most people remember that from college. But, how do you quicksort in a pure functional way. i.e without side effects. – smartnut007 Nov 04 '10 at 23:35
Its important cause, there are claims that functional paradigm can be just as fast as functions with side effects. – smartnut007 Nov 04 '10 at 23:37
List version cuts down the time by half. Still, not any where near to the speed of the java version. – smartnut007 Nov 04 '10 at 23:44
Can you please explain why you think that Scala isn't imperative? `list.filter (foo).sort (bar).take (10)` - what could be more imperative? Thanks. – user unknown Aug 18 '11 at 15:19
1

@user unknown: Perhaps you could clarify what you think "imperative" means, because your stated example looks nicely functional to me. Scala itself is neither imperative nor declarative, the language supports both styles and these terms are best used to describe specific examples. – Kevin Wright Aug 18 '11 at 16:53

score 7 · Answer 8 · answered Nov 05 '10 at 03:12

The cost of immutability in Scala

Here is a version that is nearly as fast than the Java one. ;)

object QuickSortS {
  def sort(xs: Array[Int]): Array[Int] = {
    val res = new Array[Int](xs.size)
    xs.copyToArray(res)
    (new QuickSortJ).sort(res)
    res
  }
}

This version makes a copy of the array, sorts it in place using the Java version and returns the copy. Scala does not force you to use immutable structure internally.

So the benefit of Scala is that you can leverage mutability and immutability as you see fit. The disadvantage is that if you do that wrong you don't really get the benefits of immutability.

While this isn't a precise answer to the question, I think it is part of any good answer: Quicksort is faster when using a mutable structure. But the chief advantages of immutability is the interface, and in Scala at least you can have both. Mutability is faster for quicksort, but that doesn't stand in your way of writing performant, mostly immutable code. — Paul Draper, Dec 08 '15 at 20:20

score 2 · Answer 9 · answered Nov 11 '10 at 01:14

2

It's been said that OO programming uses abstraction to hide complexity, and functional programming uses immutability to remove complexity. In the hybrid world of Scala we can use OO to hide the imperative code leaving application code none the wiser. Indeed the collections libraries use plenty of imperative code but that doesn't mean we shouldn't use them. As others have said, used with care, you really get the best of both worlds here.

answered Nov 11 '10 at 01:14

Ben Hardy

1,739
14
16

Can you please explain why you think that Scala isn't imperative? `list.filter (foo).sort (bar).take (10)` - what could be more imperative? Thanks. – user unknown Aug 18 '11 at 15:20
1

I don't see where he said Scala is not imperative. – Janx Aug 18 '11 at 18:20