0

In a previous post the original poster asked a question on how to organize a series of scenarios with varying states and parameters for simulations with the deSolve package. A solution was given, but I wonder if it can be made easier in a pipeline-friendly style, but without the use of nested data frames. The goal is to use it as a teaching example using clear tabular data structures that can be fit to a final ggplot.

Approach 1: Classical "base-R" style

Until now, I usually use a list approach with the apply-function. The resulting list of data frames can then be fit to the deSolve::plot-method:

library("deSolve")

model <- function(t, y, p) {
  with(as.list(c(y, p)), {
    dN <- r * N * (1 - N/K)
    list(c(dN))
  })
}

times <- 0:10
y0    <- c(N = 0.5)         # state variables
p     <- c(r = 0.2, K = 1)  # model parameters

## run a single simulation
out <- ode(y0, times, model, p)
plot(out)

## create a data frame with some combinations of states and parameters
scenarios <- expand.grid(N = seq(0.5, 1.5, 0.2), K = 1, r = seq(0.2, 1, 0.2))

## a function to run a simulation for a single line of the data frame
## note difference between scenarios and scenario (plural/singular)
simulate <- function(scenario) {
  ## split scenario settings to initial states (y0) and parameters (p)
  y0 <- scenario["N"]
  p  <- scenario[c("r", "K")]
  ode(y0, times, model, p)
}

## MARGIN = 1: each row is a scenario
## simplify = FALSE: function should return a list
outputs <- apply(scenarios, MARGIN = 1, FUN = simulate, simplify = FALSE)

## the plot.deSolve method works with lists as second argument
plot(out, outputs)

Approach 2: A step towards a pipeline

Based on this example, I created a function simulate_inout that returns both, inputs and outputs in a ggplot-compatible way for a single scenario. This should then be called for all scenarios (all rows) in a pipeline.

The following works:

## version of simulate that preserves inputs and outputs
simulate_inout <- function(scenario) {
  scenario <- unlist(scenario)
  ## split scenario settings to initial states (y0) and parameters (p)
  y0 <- scenario["N"]
  p  <- scenario[c("r", "K")]
  
  ## integrate the model
  output <- ode(y0, times, model, p)
  
  ## replicate rows of inputs
  input <- do.call("rbind", replicate(length(times), 
    scenario, simplify = FALSE))
  
  ## return a data frame with inputs and outputs
  cbind(input, output)
}

## a single scenario
simulate_inout(scenarios[1,])

simulate_all <- function(scenarios) {
  ## iterate over all rows
  ret <- NULL
  for (i in 1:nrow(scenarios)) {
    ret <- rbind(ret, simulate_inout(scenarios[i,]))
  }
  data.frame(ret)
}

## plot with ggplot
library("ggplot2")
scenarios |> simulate_all() |> ggplot(aes(time, N.1)) + 
  geom_path() + facet_grid(r ~ N)

Question

I would like to streamline this code in consistent tidyverse style and to get of the for-loop in simulate_all and other specific tricks like do.call.

tpetzoldt
  • 5,338
  • 2
  • 12
  • 29

1 Answers1

1

You could use purrr::map_dfr to loop over the rows of your scenarios df. Requires some rewriting of your functions such that it takes the parameters itself as arguments. Additionally I simplified your code a bit.

EDIT Replaced the superseded pmap_dfr by pmap(...) |> list_rbind(). Use ... to pass the arguments to simulate_inout.

library(deSolve)
library(ggplot2)
library(purrr)

model <- function(t, y, p) {
  N <- y
  r <- p[[1]]
  K <- p[[2]]
  
  dN <- r * N * (1 - N / K)
  
  list(dN)
}
times <- 0:10

simulate_inout <- function(...) {
  args <- list(...)
  
  y0 <- args[["N"]]
  p <- args[c("r", "K")]
  
  output <- ode(y0, times, model, p)
  
  data.frame(args, output)
}

scenarios <- expand.grid(
  N = seq(0.5, 1.5, 0.2),
  K = 1,
  r = seq(0.2, 1, 0.2)
)

scenarios |>
  purrr::pmap(simulate_inout) |>
  list_rbind() |> # or dplyr::bind_rows()
  ggplot(aes(time, X1)) +
  geom_path() +
  facet_grid(r ~ N)

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Very nice, I have been looking for `map` and `pmap` but somehow got stuck. One remaining point: `simulate_inout` should ideally work with the complete row, not the individual variable names - in the formals and in `data.frame`. The function should be as generic as possible, where the used variable names will be passed via additional arguments. – tpetzoldt Aug 04 '23 at 17:42
  • An additional finding (and a reason that originally confused me): `pmap_dfr` is tagged as superseded. Instead, it is recommended to use `list_rbind`, so one can replace it with `pmap(simulate_inout) |> purrr:list_rbind()` and so on. – tpetzoldt Aug 04 '23 at 18:00
  • 1
    Thx @tpetzoldt. Wasn't aware that `pmap_dfr` is superseded. Just made an edit to replace it using `list_rbind`. Additionally, I made the function more generic by using `...` to pass the arguments to the function. – stefan Aug 05 '23 at 06:43