Using standard evaluation and do_ to run simulations on a grid of parameters without do.call

Question

Goals

I want to use dplyr to run simulations on grids of parameters. Specifically, I'd like a function that I can use in another program that

gets passed a data.frame
for every row calculates some simulation using each column as an argument
also is passed some extra data (e.g., initial conditions)

Here's my approach

require(dplyr)
run <- function(data, fun, fixed_parameters, ...) {
   ## ....
   ## argument checking
   ##

   fixed_parameters <- as.environment(fixed_parameters)
   grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...)))
   ungroup(grouped_out)
 }

This works. For example, for

growth <- function(n, r, K, b) {
  # some dynamical simulation
  # this is an obviously-inefficient way to do this ;)
  n  + r - exp(n) / K - b - rnorm(1, 0, 0.1)
}
growth_runner <- function(r, K, b, ic, ...) {
  # a wrapper to run the simulation with some fixed values
  n0 = ic$N0
  T = ic$T
  reps = ic$reps
  data.frame(n_final = replicate(reps, {for(t in 1:T) {
                                          n0 <- growth(n0, r, K, b)
                                        };
                                        n0})
  )
}

I can define and run,

   data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
                       K = exp(seq(0.1, 5, length.out=10)),
                       r = seq(0.5, 3.5, length.out=10))
   initial_data = list(N0=0.9, T=5, reps=20)
   output <- run(data, growth_runner, initial_data)

Question

Even though this seems to work, I wonder if there's a way to do it without do.call. (In part because of issues with do.call.)

I really am interested in a way to replace the line grouped_out <- do_(rowwise(data), ~ do.call(fun, c(., fixed_parameters, ...))) with something that does the same thing but without do.call. Edit: An approach that somehow avoids the performance penalties of using do.call outlined at the above link would also work.

Notes and References

this question on do.call and standard evaluation in dplyr is helpful, but I'm looking for a way to avoid do.call if possible
dplyr's nse vignette was helpful in writing this; and makes me think .values could work in place of do.call

FWIW it sounds exactly like `plyr::mdply`. Unfortunately the two packages are rather incompatible. — baptiste, May 19 '16 at 20:39
dang, I'd never found that part of `plyr`! thanks for the pointer — jaimedash, May 19 '16 at 21:11
I think you probably want `purrr::invoke_rows` for this, it's the modern equivalent of `mdply`. http://rpackages.ianhowson.com/cran/purrr/man/by_row.html — Shorpy, May 24 '16 at 12:27

score 5 · Answer 1 · answered May 25 '16 at 17:41

I found it a little tricky to follow your code, but I think this is equivalent.

First I define a function that does the computation you're interested in:

growth_t <- function(n0, r, K, b, T) {
  n <- n0

  for (t in 1:T) {
    n <- n + r - exp(n) / K - b - rnorm(1, 0, 0.1)
  }
  n
}

Then I define the data that you want to vary, including a "dummy" variable for reps:

data <- expand.grid(
  b = seq(0.01, 0.5, length.out = 5),
  K = exp(seq(0.1, 5, length.out = 5)),
  r = seq(0.5, 3.5, length.out = 5),
  rep = 1:20
)

Then I can feed it into purrr::pmap_d(). pmap_d() does a "parallel" map - i.e. it takes a list (or data frame) as input, and calls the function varying all the named arguments for each iteration. The fixed parameters are supplied after the function name.

library(purrr)
data$output <- pmap_dbl(data[1:3], growth_t, n0 = 0.9, T = 5)

This really doesn't feel like a dplyr problem to me, because it's not really about data manipulation.

thanks! fair point re dplyr, It started with `dplyr::do`. but given expanded tooling for tidy data, and especially the direction you're heading with `purrr` (eg, http://stackoverflow.com/q/35505187/4598520 ), I agree it's probably better described as tidy data problem — jaimedash, May 25 '16 at 17:54

Tchotchke · Answer 2 · 2016-05-20T12:59:56.520

The below avoids using do.call and presents the output in the same way as the OP.

First, replace the parameters of the function with a vector that you'll pass in - this is what you'll pass through using apply.

growth_runner <- function(data.in, ic, ...) {
  # a wrapper to run the simulation with some fixed values
  n0 = ic$N0
  T = ic$T
  reps = ic$reps
  data.frame(n_final = replicate(reps, {for(t in 1:T) {
    n0 <- growth(n0, data.in[3], data.in[2], data.in[1])
  };
    n0})
  )
}

Set your grid you want to search over, just as you did before.

data <- expand.grid(b = seq(0.01, 0.5, length.out=10),
                    K = exp(seq(0.1, 5, length.out=10)),
                    r = seq(0.5, 3.5, length.out=10))
initial_data = list(N0=0.9, T=5, reps=20)

Use apply to go through your grid, then append the results

output.mid = apply(data, 1, ic=initial_data, FUN=growth_runner)
output <- data.frame('n_final'=unlist(output.mid))

And you have your output without any calls to do.call or any external library.

> dim(output)
[1] 20000     1
> head(output)
     n_final
1 -0.6375070
2 -0.7617193
3 -0.3266347
4 -0.7921655
5 -0.5874983
6 -0.4083613

Sorry, you're missing critical context of the question: using dplyr. (First line of the question). The edit of 5/19 makes this clear. This is useful code though to accomplish the same overall task in a less generic way. Thanks! — jaimedash, May 24 '16 at 17:37
Also note that `apply()` will fail as soon as you have non-numeric parameters — hadley, May 25 '16 at 17:32

score 0 · Answer 3 · answered May 25 '16 at 16:59

You can replace the line with do.call with the following (Thanks to @shorpy for pointing out purrr:invoke_rows()):

  grouped_out <- purrr::invoke_rows(fun, dplyr::rowwise(data), fixed_parameters)

without any other changes, this will give a data frame with a column of data.frames, like

Source: local data frame [1,000 x 4]
            b        K     r                .out
        (dbl)    (dbl) (dbl)               (chr)
1  0.01000000 1.105171   0.5 <data.frame [20,1]>
2  0.06444444 1.105171   0.5 <data.frame [20,1]>
3  0.11888889 1.105171   0.5 <data.frame [20,1]>

To recover something closer to the original behavior, replace the final line of run with

dplyr::ungroup(tidyr::unnest(grouped_out, .out))

which gives

Source: local data frame [20,000 x 4]

       b        K     r    n_final
   (dbl)    (dbl) (dbl)      (dbl)
1   0.01 1.105171   0.5 -0.6745470
2   0.01 1.105171   0.5 -0.7500365
3   0.01 1.105171   0.5 -0.6568312

No other changes to the code are needed :)

I wouldn't rely on that as `invoke_rows()` may not be long for this world — hadley, May 25 '16 at 17:29
thanks for the heads up! i'll look into `pmap` (as in your answer) instead — jaimedash, May 25 '16 at 17:45

Using standard evaluation and do_ to run simulations on a grid of parameters without do.call

3 Answers3