Improvements for a Map merge function

Question

I'm writing a function to merge two Map's together. This is what I have so far:

def merge[K, V1, V2, V3](left: Map[K, V1], right: Map[K, V2])
    (fn: (Option[V1], Option[V2]) => V3): Map[K, V3] = {
  val r = (left.keySet ++ right.keySet) map {
    key =>
      (key -> fn(left.get(key), right.get(key)))
  }
  r.toMap
}

The function itself works. You use the function as so:

val m1 = Map(1 -> "one", 3 -> "three", 5 -> "five")
val m2 = Map(1 -> "I", 5 -> "V", 10 -> "X")
merge(m1, m2) { (_, _) } 
// returns: 
// Map(1 -> (Some(one),Some(I)), 
//     3 -> (Some(three),None), 
//     5 -> (Some(five),Some(V)), 
//     10 -> (None,Some(X)))

I have two questions:

I'm worried about the ~~performance~~ computational complexity of the .get and .toMap calls. Can anyone improve the implementation?
I'd like the default function to make a pair of the values ({ (_, _) }). I can't quite get the syntax to do so properly.

Edit: While I originally said performance, I meant computational complexity. ~~My guess is that this function performs in O(n•ln(n)) time.~~ Looks like my function performs roughly in O(n). Can it be done in O(ln(n))?

In response to question 1: you might have a look at Scalaz's monoid typeclass for maps, as [illustrated](http://stackoverflow.com/questions/7142514/in-scala-how-can-i-do-the-equivalent-of-an-sql-sum-and-group-by/7168846#7168846) by Eric Torreborre. Your problem is a little different, but it might give you some ideas. — Kipton Barros, Sep 22 '11 at 02:50
@Kipton The map monoid is impressive. I'll have to give that some thought to see if that can apply. — leedm777, Sep 22 '11 at 14:22

huynhjl · Accepted Answer · 2011-09-23T02:39:47.073

3

For the default function literal use:

(fn: (Option[V1], Option[V2]) => V3 = 
  (x: Option[V1], y: Option[V2]) => Tuple2(x,y))

You'll have to use merge like this: merge(m1,m2)()

I would say don't be worried about performance until you perform some measurements on actual data.

Edit: about performance, by providing a view instead of constructing a map you can get quick "construction" at the expense of lookup - assuming we are dealing with immutable maps. So depending on actual data and use case, you can get better performance for certain operations, but it has a trade-off.

class MergedView[K, V1, V2, V3](
    left: Map[K, V1], right: Map[K, V2]
  )(fn: (Option[V1], Option[V2]) => V3 = (x: Option[V1], y: Option[V2]) => Tuple2(x,y)
  ) extends collection.DefaultMap[K, V3] {
  def get(key: K): Option[V3] = (left.get(key), right.get(key)) match {
    case (None, None) => None
    case t => Some(fn(t._1, t._2))
  }
  lazy val tuples = (left.keys ++ right.keys).map(key => key -> get(key).get)
  def iterator: Iterator[(K, V3)] = tuples.iterator
} 

val r1 = new MergedView(m1, m2)() // use parens here for second param list.

edited Sep 23 '11 at 02:39

answered Sep 22 '11 at 01:46

huynhjl

41,520
14
105
158

Never thought about creating a view object. Very interesting! It won't help my use case, but I'll keep that in mind in the future. – leedm777 Sep 23 '11 at 14:26
huynhjl this is awesome. @dave I'm confused by your reply because isn't this a generic solution. I can turn the view back into a map with... r1.toMap – Core May 23 '12 at 19:11
@huynhjl, how do I introduce my own key ordering into this? Thanks. – Core May 23 '12 at 19:17
@Core - I've forgotten why it didn't work in my particular situation. – leedm777 May 24 '12 at 13:50

score 1 · Answer 2 · edited May 23 '17 at 11:47

1

You shouldn't worry about get -- yes, it will create a wrapper object, but doing anything else will be awkward enough that you shouldn't try unless a profiler shows that to be a problem.

As for toMap, yes, that might well slow you down. You may try using breakOut.

Regarding complexity of get and toMap, lookup and add are effective constant time for immutable HashMap, which is the default Map. See Performance Characteristics of Scala Collections.

edited May 23 '17 at 11:47

Community

1
1

answered Sep 22 '11 at 02:45

Daniel C. Sobral

295,120
86
501
681

I'm less worried about the performance of creating the wrapper objects (which I'll need anyways) than the performance of calling a O(ln(n)) function n times. – leedm777 Sep 22 '11 at 14:17
1

@dave Looking up (ie, `get`) is effective constant time (subject to quality of hash) for the default map (HashMap). See [performance characteristics](http://www.scala-lang.org/docu/files/collections-api/collections_40.html) of Scala Collections. – Daniel C. Sobral Sep 22 '11 at 21:08
Excellent link! I had been looking for something like that. – leedm777 Sep 23 '11 at 14:19

Improvements for a Map merge function

2 Answers2

Linked