2

I'm learning Scala by working the exercises from the book "Scala for the Impatient". Please see the following question and my answer and code. I'd like to know if my answer is correct. Also the code doesn't work (all frequencies are 1). Where's the bug?

Q10: Harry Hacker reads a file into a string and wants to use a parallel collection to update the letter frequencies concurrently on portions of the string. He uses the following code:

val frequencies = new scala.collection.mutable.HashMap[Char, Int]
for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1

Why is this a terrible idea? How can he really parallelize the computation?

My answer: It is not a good idea because if 2 threads are concurrently updating the same frequency, the result is undefined.

My code:

def parFrequency(str: String) = {
  str.par.aggregate(Map[Char, Int]())((m, c) => { m + (c -> (m.getOrElse(c, 0) + 1)) }, _ ++ _)
}

Unit test:

"Method parFrequency" should "return the frequency of each character in a string" in {
  val freq = parFrequency("harry hacker")

  freq should have size 8

  freq('h') should be(2) // fails
  freq('a') should be(2)
  freq('r') should be(3)
  freq('y') should be(1)
  freq(' ') should be(1)
  freq('c') should be(1)
  freq('k') should be(1)
  freq('e') should be(1)
}

Edit: After reading this thread, I updated the code. Now the test works if ran alone, but fails if ran as a suite.

def parFrequency(str: String) = {
  val freq = ImmutableHashMap[Char, Int]()
  str.par.aggregate(freq)((_, c) => ImmutableHashMap(c -> 1), (m1, m2) => m1.merged(m2)({
      case ((k, v1), (_, v2)) => (k, v1 + v2)
  }))
}

Edit 2: See my solution below.

Community
  • 1
  • 1
Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219

3 Answers3

0

++ does not combine the values of identical keys. So when you merge the maps, you get (for shared keys) one of the values (which in this case is always 1), not the sum of the values.

This works:

def parFrequency(str: String) = {
  str.par.aggregate(Map[Char, Int]())((m, c) => { m + (c -> (m.getOrElse(c, 0) + 1)) },
  (a,b) => b.foldLeft(a){case (acc, (k,v))=> acc updated (k, acc.getOrElse(k,0) + v) })
} 

val freq = parFrequency("harry hacker")
//> Map(e -> 1, y -> 1, a -> 2,   -> 1, c -> 1, h -> 2, r -> 3, k -> 1)

The foldLeft iterates over one of the maps, updating the other map with the key/values found.

The Archetypal Paul
  • 41,321
  • 20
  • 104
  • 134
  • I'd like to use the `merge` method because I think it is the closest thing in the API that serves my need. I will change the _seqop_ operation like in your example and see if that works. – Abhijit Sarkar Jun 04 '15 at 18:52
0

You trouble in first case as you detected by yourself was in ++ operator which just concatenating, dropping second occurence of same key.

Now in the second case you have the (_, c) => ImmutableHashMap(c -> 1) which just drops all of chars found my the map in seqop stage.

My suggestion is to extend the Map type with special compination operation, working like merged in HashMap and preserve collecting from first example at seqop stage:

implicit class MapUnionOps[K, V](m1: Map[K, V]) {
  def unionWith[V1 >: V](m2: Map[K, V1])(f: (V1, V1) => V1): Map[K, V1] = {
    val kv1 = m1.filterKeys(!m2.contains(_))
    val kv2 = m2.filterKeys(!m1.contains(_))
    val common = (m1.keySet & m2.keySet).toSeq map (k => (k, f(m1(k), m2(k))))
    (common ++ kv1 ++ kv2).toMap
  }
}

def parFrequency(str: String) = {
  str.par.aggregate(Map[Char, Int]())((m, c) => {m + (c -> (m.getOrElse(c, 0) + 1))}, (m1, m2) =>  (m1 unionWith m2)(_ + _))
}

Or you can use fold solution from Paul's answer, but for better performance for each merge choose lesser map to traverse:

implicit class MapUnionOps[K, V](m1: Map[K, V]) {
  def unionWith(m2: Map[K, V])(f: (V, V) => V): Map[K, V] =
    if (m2.size > m1.size) m2.unionWith(m1)(f)
    else m2.foldLeft(m1) {
      case (acc, (k, v)) => acc + (k -> acc.get(k).fold(v)(f(v, _)))
    }
}
Odomontois
  • 15,918
  • 2
  • 36
  • 71
  • I think a `Map` subclass is overkill. – Abhijit Sarkar Jun 04 '15 at 18:53
  • @AbhijitSarkar it's not a subclass, it's just a temporary, most-probably zero-cost wrapper, works like C#'s extension method. You could think that just of function of two maps with more convenient syntax. – Odomontois Jun 04 '15 at 20:04
0

This seems to work. I like it better than the other solutions proposed here because:

  1. It's lot less code than an implicit class and slightly less code than using getOrElse with foldLeft.
  2. It uses the merged function from the API which's intended to do what I want.
  3. It's my own solution :)

    def parFrequency(str: String) = {
      val freq = ImmutableHashMap[Char, Int]()
      str.par.aggregate(freq)((_, c) => ImmutableHashMap(c -> 1), _.merged(_) {
        case ((k, v1), (_, v2)) => (k, v1 + v2)
      })
    }
    

Thanks for taking the time to help me out.

Abhijit Sarkar
  • 21,927
  • 20
  • 110
  • 219
  • Now this fails with the same error as the solution in my edit above. The resulting map randomly drops an entry. Is this a bug in the Scala `merge` function? – Abhijit Sarkar Aug 26 '15 at 06:52