6

What is the best way to apply a function to each element of a Map and at the end return the same Map, unchanged, so that it can be used in further operations?

I'd like to avoid:

myMap.map(el => {
  effectfullFn(el)
  el
})

to achieve syntax like this:

myMap
  .mapEffectOnKV(effectfullFn)
  .foreach(println)

map is not what I'm looking for, because I have to specify what comes out of the map (as in the first code snippet), and I don't want to do that.

I want a special operation that knows/assumes that the map elements should be returned without change after the side-effect function has been executed.

In fact, this would be so useful to me, I'd like to have it for Map, Array, List, Seq, Iterable... The general idea is to peek at the elements to do something, then automatically return these elements.

The real case I'm working on looks like this:

 calculateStatistics(trainingData, indexMapLoaders)
   .superMap { (featureShardId, shardStats) =>
      val outputDir = summarizationOutputDir + "/" + featureShardId
      val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
      IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
    }

Once I have calculated the statistics for each shard, I want to append the side effect of saving them to disk, and then just return those statistics, without having to create a val and having that val's name be the last statement in the function, e.g.:

val stats = calculateStatistics(trainingData, indexMapLoaders)
stats.foreach { (featureShardId, shardStats) =>
  val outputDir = summarizationOutputDir + "/" + featureShardId
  val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
  IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
}
stats

It's probably not very hard to implement, but I was wondering if there was something in Scala already for that.

Frank
  • 4,341
  • 8
  • 41
  • 57
  • Possible duplicate of [How to iterate scala map?](http://stackoverflow.com/questions/6364468/how-to-iterate-scala-map) – Yawar Mar 01 '17 at 04:46
  • No - I think it's different. I'm trying to append something that has side effects in a chain of operations on the map. – Frank Mar 01 '17 at 04:55
  • I would advise against that. You'd be needlessly iterating through the map multiple times. Just iterate once and do whatever you need to do to the key-value pairs inside the single iteration. – Yawar Mar 01 '17 at 05:00
  • This is sometimes called `tap`. – Andy Hayden Mar 01 '17 at 05:04
  • @Yawar - I added my real case. I don't think I would iterate multiple times. – Frank Mar 01 '17 at 05:05
  • OK. I don't think `Map` has a method for this--the basic operations are `foreach` and `map`. But--you can probably use `ensuring` to do it, it's just slightly clunky because you have to make sure you return `true` at the end. `ensuring` is meant to be used as design-by-contract tool. Anyway: `calculateStatistics(...) ensuring { _ foreach { case (k, v) => ... }; true }`. – Yawar Mar 01 '17 at 05:15
  • @Yawar - not being clunky in any way is of the essence here :-) – Frank Mar 01 '17 at 05:16
  • OK. There's a Unix tool called `tee` that takes an input and directs it to two different outputs. We can adapt the concept here. I'll post an answer in a couple of minutes. – Yawar Mar 01 '17 at 05:19
  • Yes - between `tee` and `tap`, we are going to end up finding something :-) Ideally, the implementation should be as widely applicable as possible (e.g. not just to `Map` :-) I'm not sure what is the right level for that. dk14 is suggesting `TraversableOnce`. – Frank Mar 01 '17 at 05:22
  • As it turns out, tee and tap are pretty much the same thing – Yawar Mar 01 '17 at 05:31
  • I know - just found the sounds funny :-) – Frank Mar 01 '17 at 05:38
  • Rex Kerr refers to is as the [KestrelPattern](http://stackoverflow.com/questions/23231509/what-is-the-added-value-of-the-kestrel-functional-programming-design-pattern-s) and uses `tap`. [I prefer `tee`.](http://stackoverflow.com/questions/41815793/filter-and-report-multiple-predicates/41816281#41816281) – jwvh Mar 01 '17 at 05:42
  • Also http://stackoverflow.com/questions/33605500/idiomatic-way-of-print-and-return-value-at-the-same-time-in-scala – Michael Zajac Mar 01 '17 at 14:20

3 Answers3

5

Function cannot be effectful by definition, so I wouldn't expect anything convenient in scala-lib. However, you can write a wrapper:

def tap[T](effect: T => Unit)(x: T) = {
  effect(x)
  x
}

Example:

scala> Map(1 -> 1, 2 -> 2)
         .map(tap(el => el._1 + 5 -> el._2))
         .foreach(println)
(1,1)
(2,2)

You can also define an implicit:

implicit class TapMap[K,V](m: Map[K,V]){
  def tap(effect: ((K,V)) => Unit): Map[K,V] = m.map{x =>
    effect(x)
    x
  }
}

Examples:

scala> Map(1 -> 1, 2 -> 2).tap(el => el._1 + 5 -> el._2).foreach(println)
(1,1)
(2,2)

To abstract more, you can define this implicit on TraversableOnce, so it would be applicable to List, Set and so on if you need it:

implicit class TapTraversable[Coll[_], T](m: Coll[T])(implicit ev: Coll[T] <:< TraversableOnce[T]){
  def tap(effect: T => Unit): Coll[T] = {
    ev(m).foreach(effect)
    m
  }
}

scala> List(1,2,3).tap(println).map(_ + 1)
1
2
3
res24: List[Int] = List(2, 3, 4)

scala> Map(1 -> 1).tap(println).toMap //`toMap` is needed here for same reasons as it needed when you do `.map(f).toMap`
(1,1)
res5: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1)

scala> Set(1).tap(println)
1
res6: scala.collection.immutable.Set[Int] = Set(1)

It's more useful, but requires some "mamba-jumbo" with types, as Coll[_] <: TraversableOnce[_] doesn't work (Scala 2.12.1), so I had to use an evidence for that.

You can also try CanBuildFrom approach: How to enrich a TraversableOnce with my own generic map?


Overall recommendation about dealing with passthrough side-effects on iterators is to use Streams (scalaz/fs2/monix) and Task, so they've got an observe (or some analog of it) function that does what you want in async (if needed) way.


My answer before you provided example of what you want

You can represent effectful computation without side-effects and have distinct values that represent state before and after:

scala> val withoutSideEffect = Map(1 -> 1, 2 -> 2)
withoutSideEffect: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)                                                                       

scala> val withSideEffect = withoutSideEffect.map(el => el._1 + 5 -> (el._2 + 5))
withSideEffect: scala.collection.immutable.Map[Int,Int] = Map(6 -> 6, 7 -> 7)

scala> withoutSideEffect //unchanged
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 1, 2 -> 2)

scala> withSideEffect //changed
res1: scala.collection.immutable.Map[Int,Int] = Map(6 -> 6, 7 -> 7)
Community
  • 1
  • 1
dk14
  • 22,206
  • 4
  • 51
  • 88
  • `.map(tap` !! How about just `.tap`? – Frank Mar 01 '17 at 05:08
  • @Frank already added this (I was editing answer when you commented :)). However original tap is more universal as it doesn't care about type of container. – dk14 Mar 01 '17 at 05:11
  • `implicit TapMap` is nice - can it be generalized to anything that can be iterated on? – Frank Mar 01 '17 at 05:12
  • `TraversableOnce`, yes, but author did ask about maps only :) – dk14 Mar 01 '17 at 05:12
  • Well not quite - the author said he would want to have that functionality for a bunch of data structures :-) – Frank Mar 01 '17 at 05:13
  • @dk14 - yes OP is pretty active. OP is thinking about choosing dk14's reply as The Answer, but Yawar is now also in the running. _Tee_ versus _tap_. Who will provide the most generic, most widely useful form? – Frank Mar 01 '17 at 05:21
  • It looks like we have a *winner*! Congratulations! – Frank Mar 01 '17 at 05:41
  • Omg! I'm out. I added `GenTraversableOnce` version. That's it. I can only recommend scalaz/fs2 `Task` for side-effects – dk14 Mar 01 '17 at 05:42
  • @dk14 - you won! `TapTraversable` is right on the money! Great job! – Frank Mar 01 '17 at 05:44
  • @Frank Thanks, I've added more useful version of `TapTraversable` – dk14 Mar 01 '17 at 06:44
  • While this is very useful (I came here looking for the same solution) folks should keep in mind that if you are working on a strict data structure you'll be recreating the data structure with each `tap`. If you don't need the intermediate results you will want to use tap on an iterator or a view to avoid the reconstruction. – Lanny Ripple Dec 04 '18 at 15:43
  • @LannyRipple To clarify your recommendation about View: TapTraversable version, I guess should suffice that (as it’s based on foreach). Though, as I recall, recent scala-collections removed (deprecated?) TraversableOnce in favor of Iterable, so implementations should be adapted accordingly. – dk14 Dec 04 '18 at 16:58
  • Also, calling tap on Iterator would probably drain it immediately. – dk14 Dec 04 '18 at 17:06
2

Looks like the concept you're after is similar to the Unix tee utility--take an input and direct it to two different outputs. (tee gets its name from the shape of the letter 'T', which looks like a pipeline from left to right with another line branching off downwards.) Here's the Scala version:

package object mypackage {
  implicit class Tee[A](a: A) extends AnyVal {
    def tee(f: A => Unit): A = { f(a); a }
  }
}

With that, we can do:

calculateStatistics(trainingData, indexMapLoaders) tee { stats =>
  stats foreach { case (featureShardId, shardStats) =>
    val outputDir = summarizationOutputDir + "/" + featureShardId
    val indexMap = indexMapLoaders(featureShardId).indexMapForDriver()
    IOUtils.writeBasicStatistics(sc, shardStats, outputDir, indexMap)
  }
}

Note that as defined, Tee is very generic--it can do an effectful operation on any value and then return the original passed-in value.

Yawar
  • 11,272
  • 4
  • 48
  • 80
  • Hmmmm - How about rolling in the foreach, so that the syntax becomes lighter? This one is very general, it is very nice from that point view, but I would be ok assuming that `A` can be iterated on. – Frank Mar 01 '17 at 05:40
1

Call foreach on your Map with your effectfull function. You original Map will not be changed as Maps in scala are immutable.

val myMap = Map(1 -> 1)
myMap.foreach(effectfullFn)

If you are trying to chain this operation, you can use map

myMap.map(el => {
    effectfullFn(el)
    el
})
soote
  • 3,240
  • 1
  • 23
  • 34