28

Say I have two maps:

val a = Map(1 -> "one", 2 -> "two", 3 -> "three")
val b = Map(1 -> "un", 2 -> "deux", 3 -> "trois")

I want to merge these maps by key, applying some function to collect the values (in this particular case I want to collect them into a seq, giving:

val c = Map(1 -> Seq("one", "un"), 2 -> Seq("two", "deux"), 3 -> Seq("three", "trois"))

It feels like there should be a nice, idiomatic way of doing this.

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
Submonoid
  • 2,809
  • 2
  • 20
  • 25
  • 5
    You should include the information, how to handle elements which happen to exist only in one Map, preferably in the example data for easy testing, to avoid ambiguity. – user unknown Oct 14 '11 at 10:21

8 Answers8

25

scala.collection.immutable.IntMap has an intersectionWith method that does precisely what you want (I believe):

import scala.collection.immutable.IntMap

val a = IntMap(1 -> "one", 2 -> "two", 3 -> "three", 4 -> "four")
val b = IntMap(1 -> "un", 2 -> "deux", 3 -> "trois")

val merged = a.intersectionWith(b, (_, av, bv: String) => Seq(av, bv))

This gives you IntMap(1 -> List(one, un), 2 -> List(two, deux), 3 -> List(three, trois)). Note that it correctly ignores the key that only occurs in a.

As a side note: I've often found myself wanting the unionWith, intersectionWith, etc. functions from Haskell's Data.Map in Scala. I don't think there's any principled reason that they should only be available on IntMap, instead of in the base collection.Map trait.

Travis Brown
  • 138,631
  • 12
  • 375
  • 680
  • 4
    unionWith, interesectionWith etc look exactly like what I'm looking for. Just a shame they're in the wrong language! – Submonoid Oct 14 '11 at 08:44
  • 1
    I just tested the scalaz functionality `intersectionWith` and found out, that keys which occur in b are ignored as well as in a. – OliverKK Sep 15 '15 at 08:33
  • what if keys are of type String? or any other type? – mpr Feb 10 '17 at 15:12
  • @mpr Then you'll need to do something like map over the values with `List(_)` and sum with the monoid instance for maps in Scalaz or Cats (or of course just write your own `intersectionWith` from scratch). – Travis Brown Feb 10 '17 at 15:17
23
val a = Map(1 -> "one", 2 -> "two", 3 -> "three")
val b = Map(1 -> "un", 2 -> "deux", 3 -> "trois")

val c = a.toList ++ b.toList
val d = c.groupBy(_._1).map{case(k, v) => k -> v.map(_._2).toSeq}
//res0: scala.collection.immutable.Map[Int,Seq[java.lang.String]] =
        //Map((2,List(two, deux)), (1,List(one, un), (3,List(three, trois)))
Infinity
  • 3,431
  • 3
  • 25
  • 46
  • 1
    Would you mind explaining that _._1 for a complete scala newbie ? – Cristiano Fontes Oct 13 '11 at 14:22
  • 5
    A map is collection of Tuples2. For example: val tuple: Tuple3[Int, Int, String] = (100, 10, "one") , if you want get a string "one" you can use tuple._3 . Tuples are useful e.g. if you want return more than one value – Infinity Oct 13 '11 at 14:31
  • 3
    And the first part of `_._1` (underscore before dot) is an anonymous name of argument. For example: `List(1,2,3,4).map(_.toDouble)` will cast all of the list members to Double. It is like `i` in `for(i <- List(1,2,3,4)) ...` – om-nom-nom Oct 13 '11 at 14:47
  • 1
    + 1 but you can simplify by leaving off the final `.toSeq` as it doesn't do anything useful – Luigi Plinge Oct 13 '11 at 16:43
  • 3
    This doesn't correctly handle cases where a key is in one map but not the other, and rebuilding the map also makes it more expensive than `intersectionWith`, which is linear with the total number of elements. – Travis Brown Oct 13 '11 at 19:53
  • "This doesn't correctly handle cases where a key is in one map but not the other." Doesn't it? I can't see how that fails. Agree with the second point though. – stackexchanger Apr 12 '19 at 21:55
15

Scalaz adds a method |+| for any type A for which a Semigroup[A] is available.

If you mapped your Maps so that each value was a single-element sequence, then you could use this quite simply:

scala> a.mapValues(Seq(_)) |+| b.mapValues(Seq(_))
res3: scala.collection.immutable.Map[Int,Seq[java.lang.String]] = Map(1 -> List(one, un), 2 -> List(two, deux), 3 -> List(three, trois))
Ben James
  • 121,135
  • 26
  • 193
  • 155
  • Actually, my values in the real case are Sequences, but I want to combine them by building into another sequence, rather than by appending one to the other. – Submonoid Oct 13 '11 at 14:15
  • I'm not sure if I understand you, sorry - do you want the values to be nested sequences or not? – Ben James Oct 13 '11 at 14:34
  • Yes, I would want nested sequences, which I could do by wrapping my existing sequences in a Seq, but this feels somewhat like cheating - and in other cases I might want to use a completely different combiner that wouldn't fit into the semigroup structure - giving the size of the intersection of the value sequences, for example. – Submonoid Oct 13 '11 at 14:53
  • I see, admittedly this is just a cheat that I would consider using in this particular situation, not a general solution. – Ben James Oct 13 '11 at 15:31
4

Starting Scala 2.13, you can use groupMap which (as its name suggests) is an equivalent of a groupBy followed by map on values:

// val map1 = Map(1 -> "one", 2 -> "two",  3 -> "three")
// val map2 = Map(1 -> "un",  2 -> "deux", 3 -> "trois")
(map1.toSeq ++ map2).groupMap(_._1)(_._2)
// Map(1 -> List("one", "un"), 2 -> List("two", "deux"), 3 -> List("three", "trois"))

This:

  • Concatenates the two maps as a sequence of tuples (List((1, "one"), (2, "two"), (3, "three"))). For conciseness, map2 is implicitly converted to Seq to align with map1.toSeq's type - but you could choose to make it explicit by using map2.toSeq.

  • groups elements based on their first tuple part (_._1) (group part of groupMap)

  • maps grouped values to their second tuple part (_._2) (map part of groupMap)

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
3
val fr = Map(1 -> "one", 2 -> "two", 3 -> "three")
val en = Map(1 -> "un", 2 -> "deux", 3 -> "trois")

def innerJoin[K, A, B](m1: Map[K, A], m2: Map[K, B]): Map[K, (A, B)] = {
  m1.flatMap{ case (k, a) => 
    m2.get(k).map(b => Map((k, (a, b)))).getOrElse(Map.empty[K, (A, B)])
  }
}

innerJoin(fr, en) // Map(1 -> ("one", "un"), 2 -> ("two", "deux"), 3 -> ("three", "trois")): Map[Int, (String, String)]
Guillaume Massé
  • 8,004
  • 8
  • 44
  • 57
2

Here is my first approach before looking for the other solutions:

for (x <- a) yield 
  x._1 -> Seq (a.get (x._1), b.get (x._1)).flatten

To avoid elements which happen to exist only in a or b, a filter is handy:

(for (x <- a) yield 
  x._1 -> Seq (a.get (x._1), b.get (x._1)).flatten).filter (_._2.size == 2)

Flatten is needed, because b.get (x._1) returns an Option. To make flatten work, the first element has to be an option too, so we can't just use x._2 here.

For sequences, it works too:

scala> val b = Map (1 -> Seq(1, 11, 111), 2 -> Seq(2, 22), 3 -> Seq(33, 333), 5 -> Seq(55, 5, 5555))
b: scala.collection.immutable.Map[Int,Seq[Int]] = Map(1 -> List(1, 11, 111), 2 -> List(2, 22), 3 -> List(33, 333), 5 -> List(55, 5, 5555))

scala> val a = Map (1 -> Seq(1, 101), 2 -> Seq(2, 212, 222), 3 -> Seq (3, 3443), 4 -> (44, 4, 41214))
a: scala.collection.immutable.Map[Int,ScalaObject with Equals] = Map(1 -> List(1, 101), 2 -> List(2, 212, 222), 3 -> List(3, 3443), 4 -> (44,4,41214))

scala> (for (x <- a) yield x._1 -> Seq (a.get (x._1), b.get (x._1)).flatten).filter (_._2.size == 2) 
res85: scala.collection.immutable.Map[Int,Seq[ScalaObject with Equals]] = Map(1 -> List(List(1, 101), List(1, 11, 111)), 2 -> List(List(2, 212, 222), List(2, 22)), 3 -> List(List(3, 3443), List(33, 333)))
user unknown
  • 35,537
  • 11
  • 75
  • 121
1

So I wasn't quite happy with either solution (I want to build a new type, so semigroup doesn't really feel appropriate, and Infinity's solution seemed quite complex), so I've gone with this for the moment. I'd be happy to see it improved:

def merge[A,B,C](a : Map[A,B], b : Map[A,B])(c : (B,B) => C) = {
  for (
    key <- (a.keySet ++ b.keySet);
    aval <- a.get(key); bval <- b.get(key)
  ) yield c(aval, bval)
}
merge(a,b){Seq(_,_)}

I wanted the behaviour of returning nothing when a key wasn't present in either map (which differs from other solutions), but a way of specifying this would be nice.

Submonoid
  • 2,809
  • 2
  • 20
  • 25
  • 1
    You've implemented a hash join. You could write different methods for each type of join, like left outer, right outer, outer, and inner that would give you the behavior you needed in each circumstance. – Joshua Hartman Oct 13 '11 at 15:45
  • Note that `IntMap`'s `intersectionWith` handles the case of a key only occurring in one map as you specify here. – Travis Brown Oct 13 '11 at 17:57
  • A shorter more efficent alternative that does the same would be: `def merge[A,B,C](a : Map[A,B], b : Map[A,B])(c : (B,B) => C) = { for((k,v1) <-a; v2 <- b.get(k)) yield (k, c(v1, v2)) } merge(a,b){Seq(_,_)}` – Markus Knecht Dec 25 '15 at 16:06
0
def merge[A,B,C,D](b : Map[A,B], c : Map[A,C])(d : (Option[B],Option[C]) => D): Map[A,D] = {
  (b.keySet ++ c.keySet).map(k => k -> d(b.get(k), c.get(k))).toMap
}

def optionSeqBiFunctionK[A]:(Option[A], Option[A]) => Seq[A] = _.toSeq ++ _.toSeq

merge(a,b)(optionSeqBiFunctionK)

  • 1
    Please put your answer always in context instead of just pasting code. See [here](https://stackoverflow.com/help/how-to-answer) for more details. – gehbiszumeis Nov 27 '19 at 12:04