2

I would appreciate any help to do this:

  1. for each P fit the model for each column of weights.
  2. do the step 1 for all observations in the dataset to get p * w submodels, where p is the number of P and w is the number of columns of weights.
  3. pool the results by combining the posterior samples of the submodels.

A similar brms implementation is available here, also discussed here. But that allows only incorporation of uncertainty coming from multiple imputation P. I want to account for the uncertainty coming from the estimation of the weights as well.

@paul.buerkner says there that we could also extract those data sets and then pass them to the actual model fitting function as a list of data frames. Using this approach, the solution would probably be to pass my individual p * w data frames, each with p=1 and w=1. Still, it is not clear how the list of these data frames could in practice be passed to the model fitting function.

In my case, the multiply imputted values of the predictors are in long format with each imputation stage represented by a value of the variable P, and the set of weights estimated at each P are in wide format and represented by the variables w_1 to w_10.

There is also the stan approach discussed here. This requires calling "stan by parallel and it returns a list of stanfit objects," and then use sflist2stanfit to create one stanfit object from the list. Here the problem is that I would need to separate my dataframe into p * w = 50, or more than 100 datasets (if I increase the number of imputations P to more than 10 as generally recommended), each with p=1 and w=1.

Following @BobCarpenter's recommendation, below I present my attempt to parallelize the likelihood. It seems this approach could allow me to account for one of the sources of uncertainty separately, but not both jointly as intended. Here I am not certain how to specify the transformed parameters block and the likelihood to account for the uncertainty from P too.

Any help to improve and/or correct my current attempt to accomplish my objective would be much appreciated. Any input to implement any of the other approaches discussed above would also be much appreciated. I apologise if you find any basic mistakes in my code. While I am certain about what I want to do and why, the stan implementation of my ideas is still challenging - still learning.

#model

data{
int N;                                     
int P;                 
int ncases[N];         
int A[N];                
int B[N];              
int nn[N];              
int id[N];               
real w_1[N];       
real w_2[N];       
real w_3[N];      
real w_4[N];       
real w_5[N];       
real w_6[N];       
real w_7[N];       
real w_8[N];       
real w_9[N];       
real w_10[N];       
}

parameters {
real beta_0; 
real beta_1; 
real beta_2; 
real beta_3; 
}

transformed parameters {
vector[N] pp_hat;  
vector[N] odds;    
for (i in 1:N) { 
odds[i] = exp(beta_0) * (1 + beta_1*A[i] + beta_2*B[i] + beta_3*A[i]*B[i]);
pp_hat[i] = odds[i]/(1 + odds[i]);
}
}

model {
beta_0 ~ normal(0, 5);
beta_1 ~ normal(0, 5);
beta_2 ~ normal(0, 5);
beta_3 ~ normal(0, 5);
for (i in 1:N){
  target += w_1[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_2[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_3[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_4[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_5[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_6[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_7[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_8[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_9[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
  target += w_10[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]);
 }
}

Thanks in advance.

Krantz
  • 1,424
  • 1
  • 12
  • 31
  • 1
    Why are you trying to do this rather than running jointly? Is it memory or compute time? If it's just compute time, then I'd recommend parallelizing the likelihood. Michael Betancourt wrote a paper on arXiv explaining why it's pretty much impossible to subsample with MCMC. Nevertheless, papers keep getting published about attempts to do what you're asking. – Bob Carpenter Jan 18 '19 at 16:47
  • Thanks for the input, @BobCarpenter. How could I implement `parallelizing the likelihood` or `running jointly`? I think that could work - all I need is to be able to account for the two sources of uncertainty - just not certain how to do in stan. Any help would be much appreciated – Krantz Jan 19 '19 at 11:15
  • 1
    There's a parallelized map function that's documented in the manual and explained with examples in the user's guide. It just lets you parallelize the density calculation, which is the computational bottleneck. – Bob Carpenter Jan 20 '19 at 23:56
  • Thanks, @BobCarpenter. Which chapter can I find the explanation and examples of the `parallelized map function` in the manual? – Krantz Jan 21 '19 at 00:15
  • Additionally, @BobCarpenter, would doing the following (stan manual, chapter 7, page 59): model { ...; target += w_1[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]); target += w_2[i] * binomial_lpmf(ncases[i] | nn[i], pp_hat[i]); ...} be the approach to parallelizing the likelihood? If so (assuming that parallelizing the likelihood that way is the way to go, i.e., that that would allow incorporation of uncertainty from weigths, how would I specify the transformed parameters block (and the likelihood) to account for uncertainty coming from P? Thanks in advance. – Krantz Jan 21 '19 at 22:06
  • 1
    Q1) Chapter "Map Reduce" in the user's guide and section "Higher-order Map" in the functions reference. The relevant function is `map_rect()`, as we only have the rectangular version implemented so far. Q2) Yes, you'd break the data up into chunks and parallelize the likelhood calculations for each chunk. The results are the same as if it was run serially, because terms in the log likelihood are additive. – Bob Carpenter Jan 22 '19 at 01:53
  • Thanks for the tremendous help and input, @BobCarpenter. I am delighted to let you know that now I am able to jointly account for the two sources of uncertainty. – Krantz Jan 25 '19 at 12:11

0 Answers0