3

I have an iterable val pairs: Iterable[Pair[Key, Value]], that has some key=>value pairs.

Now, I want to create a Map[Key, Iterable[Value]], that has for each key an Iterable of all values of given key in pairs. (I don't actually need a Seq, any Iterable is fine).

I can do it using mutable Map and/or using mutable ListBuffers.

However, everyone tells me that the "right" scala is without using mutable collections. So, is it possible to do this only with immutable collections? (for example, with using map, foldLeft, etc.)

Karel Bílek
  • 36,467
  • 31
  • 94
  • 149

5 Answers5

5

I have found out a really simple way to do this

pairs.groupBy{_._1}.mapValues{_.map{_._2}}

And that's it.

Karel Bílek
  • 36,467
  • 31
  • 94
  • 149
4

Anything that you can do with a non-cyclic mutable data structure you can also do with an immutable data structure. The trick is pretty simple:

loop -> recursion or fold
mutating operation -> new-copy-with-change-made operation

So, for example, in your case you're probably looping through the Iterable and adding a value each time. If we apply our handy trick, we

def mkMap[K,V](data: Iterable[(K,V)]): Map[K, Iterable[V]] = {
  @annotation.tailrec def mkMapInner(
    data: Iterator[(K,V)],
    map: Map[K,Vector[V]] = Map.empty[K,Vector[V]]
  ): Map[K,Vector[V]] = {
    if (data.hasNext) {
      val (k,v) = data.next
      mkMapInner(data, map + (k -> map.get(k).map(_ :+ v).getOrElse(Vector(v))))
    }
    else map
  }
  mkMapInner(data.iterator)
}

Here I've chosen to implement the loop-replacement by declaring a recursive inner method (with @annotation.tailrec to check that the recursion is optimized to a while loop so it won't break the stack)

Let's test it out:

val pairs = Iterable((1,"flounder"),(2,"salmon"),(1,"halibut"))
scala> mkMap(pairs)
res2: Map[Int,Iterable[java.lang.String]] = 
      Map(1 -> Vector(flounder, halibut), 2 -> Vector(salmon))

Now, it turns out that Scala's collection libraries also contain something useful for this:

scala> pairs.groupBy(_._1).mapValues{ _.map{_._2 } }

with the groupBy being the key method, and the rest cleaning up what it produces into the form you want.

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • Thanks for answer; I have found out identical thing myself, consuting scaladoc for a while :) – Karel Bílek Jul 02 '12 at 21:48
  • So you have! I wanted to give the full "what to do about mutable patterns" part of the answer, though, because you can't _always_ find the desired methods in the Scala library, but the loop-to-recursion thing is a general strategy. – Rex Kerr Jul 02 '12 at 21:56
  • Can you describe more, what is going on in line 8 (first line beginning with mkMapInner)? I am kind of lost in the syntax. – Karel Bílek Jul 02 '12 at 22:00
  • `mkMapInner` is just a method (inside the other method!). Pretend `@annotation.tailrec` doesn't exist, and that the method has arguments `data: Iterator[(K,V)]` and `map: Map[K,Vector[V]]`. The `Map.empty` thing is a default argument, and I wrote the method definition on four lines instead of one just because it got too long. After I'm done with the inner function, all the outer function does is call it (with an `Iterator`, since that's what the inner one wants). – Rex Kerr Jul 02 '12 at 22:19
3

For the record, you can write this pretty cleanly with a fold. I'm going to assume that your Pair is the one in the standard library (aka Tuple2):

pairs.foldLeft(Map.empty[Key, Seq[Value]]) {
  case (m, (k, v)) => m.updated(k, m.getOrElse(k, Seq.empty) :+ v)
}

Although of course in this case the groupBy approach is more convenient.

Travis Brown
  • 138,631
  • 12
  • 375
  • 680
1
val ps = collection.mutable.ListBuffer(1 -> 2, 3 -> 4, 1 -> 5)

ps.groupBy(_._1).mapValues(_ map (_._2))
  // = Map(1 -> ListBuffer(2, 5), 3 -> ListBuffer(4))

This gives a mutable ListBuffer in the output map. If you want your output to be immutable (not sure if this is quite what you're asking), use collection.breakOut:

ps.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
   // = Map(1 -> Vector(2, 5), 3 -> Vector(4))

It seems like Vector is the default for breakOut, but to be sure, you can specify the return type on the left hand side: val myMap: Map[Int,Vector[Int]] = ....

More info on breakOut here.

As a method:

def immutableGroup[A,B](xs: Traversable[(A,B)]): Map[A,Vector[B]] =
  xs.groupBy(_._1).mapValues(_.map(_._2)(collection.breakOut))
Community
  • 1
  • 1
Luigi Plinge
  • 50,650
  • 20
  • 113
  • 180
0

I perform this function so often that I have an implicit written called groupByKey that does precisely this:

class EnrichedWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) {
  def groupByKey[T, U, That](implicit ev: A <:< (T, U), bf: CanBuildFrom[Repr, U, That]): Map[T, That] =
    self.groupBy(_._1).map { case (k, vs) => k -> (bf(self.asInstanceOf[Repr]) ++= vs.map(_._2)).result }
}
implicit def enrichWithGroupByKey[A, Repr <: Traversable[A]](self: TraversableLike[A, Repr]) = new EnrichedWithGroupByKey[A, Repr](self)

And you use it like this:

scala> List(("a", 1), ("b", 2), ("b", 3), ("a", 4)).groupByKey
res0: Map[java.lang.String,List[Int]] = Map(a -> List(1, 4), b -> List(2, 3))

Note that I use .map { case (k, vs) => k -> ... } instead of mapValues because mapValues creates a view, instead of just performing the map immediately. If you plan on accessing those values many times, you'll want to avoid the view approach because it will mean recomputing the .map(_._2) every time.

dhg
  • 52,383
  • 8
  • 123
  • 144