0

I run lots of heavy computations in R, and I am trying to write a function that would help me cache my results and only compute them if necessary. I came up with this:

memoize <- function(filename, fun, ...){
    if(file.exists(filename)){
        message(paste("Reading", filename))
        return(readRDS(filename))
    } else {
        message(paste(filename, "not found, running computations"))
        result <- do.call(fun, list(...))
        message(paste("Writing", filename))
        saveRDS(result, filename)
        return(result)
    }
}

which allows me to do:

result <- memoize("test.rds", function(){ return(runif(100))} )

This works just fine, however, I want to be able to do without the "function()" part, i.e. pass an expression instead of a function. This can be done as follows:

memoize <- function(filename, expr){
    if(file.exists(filename)){
        message(paste("Reading", filename))
        return(readRDS(filename))
    } else {
        message(paste(filename, "not found, running computations"))
    
         f <- function(){}
        body(f) <- expr
        environment(f) <- parent.frame()
        result <- do.call(f, list())
    
        message(paste("Writing", filename))
        saveRDS(result, filename)
        return(result)
    }
}

as per capturing an expression as a function body in R

This works fine when using

result <- memoize("test.rds", { runif(100)} )

However, I would like to keep the explicit return() statements in that expression. When I try

result <- memoize("test.rds", { return(runif(100))} )

I get

no function to return from, jumping to top level

I clearly misunderstand how this works: why does expr seem to be evaluated before being bound to f? And how could I achieve this?

user11130854
  • 333
  • 2
  • 9
  • Fundamentally, [adding `return` statements everywhere in R doesn’t make sense to begin with](https://stackoverflow.com/a/59090751/1968). – Konrad Rudolph Mar 22 '21 at 15:54
  • I am well aware of this, but this is a matter of personal preference and clearly a key part of the question. – user11130854 Mar 22 '21 at 16:03
  • 1
    I think the [`memoise`](https://cran.r-project.org/web/packages/memoise/index.html) package might be an in-place substitute for this, and it works well. – r2evans Mar 22 '21 at 16:12

1 Answers1

2

Regardless of my recommendation not to use return in R except when early access is explicitly required, the following should work:

memoize = function (cache_filename, expr) {
    if (file.exists(cache_filename)) {
        message('reading ', cache_filename)
        readRDS(cache_filename)
    } else {
        message(cache_filename, ' not found, rerunning computation')
        f = local(function () NULL, envir = .GlobalEnv)
        body(f) = substitute(expr)
        result = f()
        saveRDS(result, cache_filename)
        result
    }
}

Note in particular that using do.call(f, list()) isn’t necessary and doesn’t really make sense: it’s a regular function, just call it using f().

What your current solution does is actually quite interesting: it evaluates expr at the point of use and assigns the result to the function body of f. In fact, this only happens to work when the result is a simple R value, not when it’s a complex object. And in that case you don’t need the nested function at all, you can evaluate expr on its own, and this is in fact a fairly common pattern (used, for example, in the implementations of try and tryCatch!). In other words, the following would be an idiomatic implementation (but doesn’t support using control flow such as return():

memoize = function (cache_filename, expr) {
    if (file.exists(cache_filename)) {
        message('reading ', cache_filename)
        readRDS(cache_filename)
    } else {
        message(cache_filename, ' not found, rerunning computation')
        result = expr
        saveRDS(result, cache_filename)
        result
    }
}
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214