I'm learning Scala by working the exercises from the book "Scala for the Impatient". Please see the following question and my answer and code. I'd like to know if my answer is correct. Also the code doesn't work (all frequencies are 1). Where's the bug?
Q10: Harry Hacker reads a file into a string and wants to use a parallel collection to update the letter frequencies concurrently on portions of the string. He uses the following code:
val frequencies = new scala.collection.mutable.HashMap[Char, Int] for (c <- str.par) frequencies(c) = frequencies.getOrElse(c, 0) + 1
Why is this a terrible idea? How can he really parallelize the computation?
My answer: It is not a good idea because if 2 threads are concurrently updating the same frequency, the result is undefined.
My code:
def parFrequency(str: String) = {
str.par.aggregate(Map[Char, Int]())((m, c) => { m + (c -> (m.getOrElse(c, 0) + 1)) }, _ ++ _)
}
Unit test:
"Method parFrequency" should "return the frequency of each character in a string" in {
val freq = parFrequency("harry hacker")
freq should have size 8
freq('h') should be(2) // fails
freq('a') should be(2)
freq('r') should be(3)
freq('y') should be(1)
freq(' ') should be(1)
freq('c') should be(1)
freq('k') should be(1)
freq('e') should be(1)
}
Edit: After reading this thread, I updated the code. Now the test works if ran alone, but fails if ran as a suite.
def parFrequency(str: String) = {
val freq = ImmutableHashMap[Char, Int]()
str.par.aggregate(freq)((_, c) => ImmutableHashMap(c -> 1), (m1, m2) => m1.merged(m2)({
case ((k, v1), (_, v2)) => (k, v1 + v2)
}))
}
Edit 2: See my solution below.