2

So, while answering some other question I stumbled upon the necessity of computing the median of 5. Now, there's a similar question in another language, but I want a Scala algorithm for it, and I'm not sure I'm happy with mine.

Community
  • 1
  • 1
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681
  • Why don't you post yours for a start? (I was initially confused by the quest resp the linked answer because I only knew median-of-three.) – Raphael Jan 21 '11 at 17:46
  • @Raphael Done. I also changed the wording a bit, because I'm interesting in something that will compute the median of one to five elements. – Daniel C. Sobral Jan 21 '11 at 20:17

3 Answers3

5

Here's an immutable Scala version that has the minimum number of compares (6) and doesn't look too ugly:

def med5(five: (Int,Int,Int,Int,Int)) = {

  // Return a sorted tuple (one compare)
  def order(a: Int, b: Int) = if (a<b) (a,b) else (b,a)

  // Given two self-sorted pairs, pick the 2nd of 4 (two compares)
  def pairs(p: (Int,Int), q: (Int,Int)) = {
    (if (p._1 < q._1) order(p._2,q._1) else order(q._2,p._1))._1
  }

  // Strategy is to throw away smallest or second smallest, leaving two self-sorted pairs
  val ltwo = order(five._1,five._2)
  val rtwo = order(five._4,five._5)
  if (ltwo._1 < rtwo._1) pairs(rtwo,order(ltwo._2,five._3))
  else pairs(ltwo,order(rtwo._2,five._3))
}

Edit: As requested by Daniel, here's a modification to work with all sizes, and in arrays so it should be efficient. I can't make it pretty, so efficiency is the next best thing. (>200M medians/sec with a pre-allocated array of 5, which is slightly more than 100x faster than Daniel's version, and 8x faster than my immutable version above (for lengths of 5)).

def med5b(five: Array[Int]): Int = {

  def order2(a: Array[Int], i: Int, j: Int) = {
    if (a(i)>a(j)) { val t = a(i); a(i) = a(j); a(j) = t }
  }

  def pairs(a: Array[Int], i: Int, j: Int, k: Int, l: Int) = {
    if (a(i)<a(k)) { order2(a,j,k); a(j) }
    else { order2(a,i,l); a(i) }
  }

  if (five.length < 2) return five(0)
  order2(five,0,1)
  if (five.length < 4) return (
    if (five.length==2 || five(2) < five(0)) five(0)
    else if (five(2) > five(1)) five(1)
    else five(2)
  )
  order2(five,2,3)
  if (five.length < 5) pairs(five,0,1,2,3)
  else if (five(0) < five(2)) { order2(five,1,4); pairs(five,1,4,2,3) }
  else { order2(five,3,4); pairs(five,0,1,3,4) }
}
Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • Beautiful, but it lacks 4 cases. ;-) – Daniel C. Sobral Jan 21 '11 at 20:16
  • Fair enough, but what don't you like about your solution? Looks fine to me. Any solution that is both optimal and involves all cases is going to look ugly. – Rex Kerr Jan 21 '11 at 21:08
  • @Rex I expect magic, of course! :-) Well... `arr.sorted.apply((arr - 1) / 2)` works, but I mean aside from that. :-) – Daniel C. Sobral Jan 22 '11 at 05:39
  • @Daniel - Well, the latest update is as much magic as I have in me. – Rex Kerr Jan 22 '11 at 19:32
  • @Rex Actually, I liked it a lot. Have you tested if with Scalacheck or something? I ought to try it out with my median algorithm, see how much it speeds it up. I'm shocked by the 100x figure. – Daniel C. Sobral Jan 22 '11 at 22:36
  • @Rex Well, it doesn't work for arrays of size 2, because of `five(2)` for `five.length < 4`. It's missing the case for `five.length == 2`. – Daniel C. Sobral Jan 22 '11 at 23:50
  • Of course I only tested it from three to five. Whoops! Fixed now. – Rex Kerr Jan 23 '11 at 15:49
  • @Daniel - Oh, and extensive pattern matching is expensive. I generally count on about 10x slowdown using immutable stuff instead of primitives, and about another 10x using nontrivial pattern matching instead of carefully-crafted conditionals. – Rex Kerr Jan 23 '11 at 15:51
  • @Rex I assumed array extraction wouldn't be much of a penalty. – Daniel C. Sobral Jan 23 '11 at 22:41
  • @Rex So, I tested it and my own tests only showed my version being 75% slower than yours. Since this code _is_ critical, it is a welcome 75%, but nowhere near 1000%. :-) – Daniel C. Sobral Jan 23 '11 at 22:55
  • @Daniel - You are probably spending a lot of time doing other things (like building the array, for instance--though if you have an existing array, it's better to write the above non-destructively and index into the array in the middle so you don't have any object allocations). I was testing the ideal case where the array was already there and was only having a single element updated from run to run. – Rex Kerr Jan 24 '11 at 00:01
  • @Rex It is possible, but I discounted the time taken by these other activities (measured by an algorithm that used random selection instead of median of 5). Ok, there was still a groupBy that generated the 5-arrays, but I'd be surprised if that amounted to such a huge difference. – Daniel C. Sobral Jan 24 '11 at 02:30
  • @Daniel - I didn't even create arrays--I just updated one element per iteration. If it was a `groupBy` then it's probably essential, but if it's a `grouped` you could do it a different way. – Rex Kerr Jan 24 '11 at 12:15
  • @Rex I started doing in a different way, but since I had to recurse on the results, it started getting... complicated. I just went to the simple solution. – Daniel C. Sobral Jan 24 '11 at 19:30
2

Jeez, way to over-think it, guys.

def med5(a : Int, b: Int, c : Int, d : Int, e : Int) = 
   List(a, b, c, d, e).sort(_ > _)(2)
Michael Lorton
  • 43,060
  • 26
  • 103
  • 144
  • 1
    The question would never be asked in the first place, unless the "obvious" solution had been ruled out as sub-optimal. Right @Daniel? – Kevin Wright Jan 22 '11 at 10:05
  • Sort is suboptimal for the task. Also, you could use `.sorted` instead of `.sort(_ > _)`. – Daniel C. Sobral Jan 22 '11 at 15:14
  • Suboptimal by what standard? It's *much* more clear and will run, literally, in a microsecond. @Daniel -- I'm very new to Scala, so I'm still groping around the library, thanks. – Michael Lorton Jan 22 '11 at 17:03
  • Suboptimal in time and space complexity. This can be done with 0 space and 6 comparisons, while the above require linear space and, according to some tests here, about 12,75 comparisons. A single pass of median of medians on a 5000-element set will call this 1250 times. To complete a median search, it will do about 10 such passes, so this gets called 12500 times, destroying cache locality and doing more than 75000 unnecessary comparisons. Note that I do not know if 6 comparisons is optimal, but sorting definitely isn't. – Daniel C. Sobral Jan 22 '11 at 22:32
0

As suggested, here's my own algorithm:

def medianUpTo5(arr: Array[Double]): Double = {
    def oneAndOrderedPair(a: Double, smaller: Double, bigger: Double): Double =
        if (bigger < a) bigger
        else if (a < smaller) smaller else a

    def partialOrder(a: Double, b: Double, c: Double, d: Double) = {
        val (s1, b1) = if (a < b) (a, b) else (b, a)
        val (s2, b2) = if (c < d) (c, d) else (d, c)
        (s1, b1, s2, b2)
    }

    def medianOf4(a: Double, b: Double, c: Double, d: Double): Double = {
        val (s1, b1, s2, b2) = partialOrder(a, b, c, d)
        if (b1 < b2) oneAndOrderedPair(s2, s1, b1)
        else oneAndOrderedPair(s1, s2, b2)
    }

    arr match {
        case Array(a)       => a
        case Array(a, b)    => a min b
        case Array(a, b, c) =>
            if (a < b) oneAndOrderedPair(c, a, b) 
            else oneAndOrderedPair(c, b, a)
        case Array(a, b, c, d) => medianOf4(a, b, c, d)
        case Array(a, b, c, d, e) =>
            val (s1, b1, s2, b2) = partialOrder(a, b, c, d)
            if (s1 < s2) medianOf4(e, b1, s2, b2)
            else medianOf4(e, b2, s1, b1)
    }
}
Daniel C. Sobral
  • 295,120
  • 86
  • 501
  • 681