execute function only once and cache value in scala

Question

I have a function like so

def runOnce(request: Request): Future[Result] = {
}

When I call this runOnce function, if it has not been run, I want it to run some method and return that result. If it has been run, I just want it to return the original result (the request coming in will be the same).

I can do it if I have no param like so

lazy val hydratedModel = hydrateImpl(request)

future for efficient filtering
def fetchHydratedModel(): Future[HydratedModelRequest] = {
   hydratedModel
}

How to do in first case?

Will the function be really ever called more than once with one `request`? The technique for what you want is called memoization, but applying it to requests seems very strange to me. For memoization see [Is there a generic way to memoize in Scala?](https://stackoverflow.com/questions/16257378/is-there-a-generic-way-to-memoize-in-scala) — Suma, Oct 11 '17 at 19:46
Are you really need a function? its a more to do with call by name and call value evaluation strategy. — Pavel, Oct 11 '17 at 19:56

Mike Allen · Answer 1 · 2019-01-17T13:17:19.917

There's a general solution to this problem, which is function memoization; for a pure function (one that has no side-effects - it will not work for non-pure functions), the result of a function call should always be the same for the same set of argument values. Therefore, an optimization is to cache the value on the first call and to return it for subsequent calls.

You can achieve this with something like the following (a memoization class for pure functions with a single argument, updated—see comment below—to make it thread-safe):

/** Memoize a pure function `f(A): R`
 *
 *  @constructor Create a new memoized function.
 *  @tparam A Type of argument passed to function.
 *  @tparam R Type of result received from function.
 *  @param f Pure function to be memoized.
 */
final class Memoize1[A, R](f: A => R) extends (A => R) {

  // Cached function call results.
  private val result = scala.collection.mutable.Map.empty[A, R]

  /** Call memoized function.
   *
   *  If the function has not been called with the specified argument value, then the
   *  function is called and the result cached; otherwise the previously cached
   *  result is returned.
   *
   *  @param a Argument value to be passed to `f`.
   *  @return Result of `f(a)`.
   */
  def apply(a: A) = synchronized(result.getOrElseUpdate(a, f(a)))
}

/** Memoization companion */
object Memoize1 {

  /** Memoize a specific function.
   *
   *  @tparam A Type of argument passed to function.
   *  @tparam R Type of result received from function.
   *  @param f Pure function to be memoized.
   */
  def apply[A, R](f: A => R) = new Memoize1(f)
}

Assuming that the function you're memoizing is hydrateImpl, you can then define and use runOnce as follows (note that it becomes a val not a def):

val runOnce = Memoize1(hydrateImpl)
runOnce(someRequest) // Executed on first call with new someRequest value, cached result subsequently.

UPDATE: Regarding thread-safety.

In reply to the comment from user1913596, the answer is "no"; scala.collection.mutable.Map.getOrElseUpdate is not thread-safe. However, it's fairly trivial to synchronize access, and I have updated the original code accordingly (embedding the call within sychronized(...)).

The performance hit of locking access should be negated by the improved execution time (assuming that f is nontrivial).

Does this also apply to a multi-threading environment where multiple threads can invoke `runOnce(someRequest)` before the value is set in the `result` map ? It doesn't seem so. What would you propose in a concurrent access case ? Not to evaluate `f()` multiple times. — user1913596, Jan 17 '19 at 12:15
@user1913596 Good point! No, _Scala_ mutable `Map`s are not thread-safe. However, it's fairly easy to workaround that. I've updated my answer accordingly. Thanks for pointing that out! — Mike Allen, Jan 17 '19 at 13:13
Thanks for the update. Just to add for the future readers, as of Scala version `2.11.12`, using `scala.collection.concurrent.Map` or even `scala.collection.concurrent.TrieMap` is not enough to make it thread safe. In both, `getOrElseUpdate` is not an atomic operation, even though the documentation ensures that `op` evaluation happens only once in a `TrieMap`. The solution is to use the synchronized block as stated by @Mike Allen. — user1913596, Jan 18 '19 at 10:21

score 0 · Answer 2 · answered Oct 11 '17 at 20:28

There are likely better ways to do this depending on your setup, but a simple solution is to do the following

private var model: Option[Future[HydratedModelRequest]] = None

def runOnce(request: Request): Future[Request] = {
  if (model.isEmpty) {
    model = hydrateImpl(request)
  }

  model.get
}

If the request is indeed the same for each call, another option would be to require the request implicitly and hydrate lazily.

implicit val request: Request
lazy val hydratedRequest: Future[HydratedModelRequest] = hydrateImpl(request)

execute function only once and cache value in scala

2 Answers2