1

I have been trying to use am R function called ipsi, which takes arguments (a, y, id, time, x.trt, x.out, delta.seq, nsplits) Originally, the components of the arguments were in one dataframe (except for delta.seq and nsplits which are coded later), but my understanding is I needed to put them in separate lists, and in the case of x.trt and x.out, matrices. This function is very easy to run on one of each argument, but since I multiply imputed the dataframe 30 times before splitting it up into different elements to be taken as ipsi arguments, I now want to iterate over the set of elements 30 times as if there were 30 dataframes. Additionally, I want to parallelize to optimize my computing power.

I have just expanded the npcausal example:

n <- 500
T <- 4
time <- rep(1:T, n)
time <- list(time,time,time,time,time,time,time,time,time,time,time,time,time,time,time,
             time,time,time,time,time,time,time,time,time,time,time,time,time,time,time)
id <- rep(1:n, rep(T, n))
id <- list(id,id,id,id,id,id,id,id,id,id,id,id,id,id,id,
             id,id,id,id,id,id,id,id,id,id,id,id,id,id,id)
x.trt <- matrix(rnorm(n * T * 5), nrow = n * T)
x.trt <- list(x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,
             x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt)
x.out <- matrix(rnorm(n * T * 5), nrow = n * T)
x.out <- list(x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,
             x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out)
a <- rbinom(n * T, 1, .5)
a <- list(a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,
             a,a,a,a,a,a,a,a,a,a,a,a,a,a,a)
y <- rnorm(mean=1,n)
y <- list(y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,
             y,y,y,y,y,y,y,y,y,y,y,y,y,y,y)
d.seq <- seq(0.1, 5, length.out = 10)
d.seq <- list(d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,
             d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq)

set.seed(500, kind = "L'Ecuyer-CMRG")
numcores <- future::availableCores()
cl <- parallel::makeCluster(numcores)
parallel::clusterEvalQ(cl, library(dplyr))
parallel::clusterEvalQ(cl, library(npcausal))
parallel::clusterExport(cl, "d.seq", envir = environment())
parallel::clusterEvalQ(cl, d.seq <- d.seq)

new_element <- parallel::parLapply(cl = cl, for(i in 1:30){
  npcausal::ipsi(a = a[[i]],
                 y = y[[i]],
                 id = id[[i]],
                 time = time[[i]],
                 x.out = x.out[[i]],
                 x.trt = x.trt[[i]],
                 delta.seq = d.seq[[i]],
                 nsplits = 10)
})

This actually runs, but at the end of the process it gives me an error saying that the FUN was missing. I knew that already, but I have no FUN to call besides ipsi. Thanks for any help you can provide.

Alex
  • 25
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Sep 29 '21 at 18:58
  • @Mr.Flick please see the edited example above. – Alex Sep 29 '21 at 19:22
  • I'm still not sure I understand what the desired output is. But with `parLapply` just like `lapply` expects you to iterate over some list with some function. You seem to have have provided a list or function but instead put in a `for` loop. I'm not really sure what you are expecting that to do. Are you just trying to parallelize over those 30 iterations? – MrFlick Sep 29 '21 at 19:27
  • Please see the desired output by running the example from the npcausal::ipsi documentation (https://github.com/ehkennedy/npcausal/blob/master/npcausal.pdf). npcausal's author does not use dataframes as ipsi's arguments. What I have is 30 imputed dataframes and I need to find a way to input pieces of those dataframes into ipsi's arguments. So, I chopped up all 30 imputed dataframes into different objects to input into ipsi. I just can't figure out how to iterate over each object 30 times. Also, I recognize that my reproducible example's elements are just 30 copies of each object. – Alex Sep 29 '21 at 19:48

1 Answers1

2

My suggestion is to first figure out how to do it with a regular base-R *apply function without worrying about parallelization. I suspect you can use mapply() for this, so something like (non confirmed):

res <- mapply(
  a, y, id, time, xout, x,out, x.trt, d.seq,
  FUN = function(a_i, y_i, id_i, time_i, xout_i, x,out_i, x.trt_i, d.seq_i) {
    npcausal::ipsi(a = a_i, y = y_i, id = id_i, time = time_i,
                   x.out = x.out_i, x.trt = x.trt_i, delta.seq = d.seq_i,
                   nsplits = 10)
  }
)

When you figured that part out, you can start thinking about parallelization.

(Disclaimer: I'm the author) If you get an mapply() solution to work, then the simplest would be to replace that as-is with future_mapply() of the future.apply package. That will parallelize on your local machine if you set plan(multisession).

HenrikB
  • 6,132
  • 31
  • 34