I am fitting a logistic model in STAN (rstan library). My response variable does not have any missing values, however one of my covariates "HB" is binary and has missing entries.
Thus the goal is to impute at each iteration the missing entries in the binary vector, using a Bernoulli prior (with parameter, say, 0.5).
However, I am running into issues:
- The missing data needs to be declared either as real or vector in the parameters or transformed parameters block;
- Realizations from a Bernoulli distribution in the model block need to be integers;
- As far as I am aware there is no function in STAN to convert a real or vector to integer.
I used the guidelines provided in section 3.3 of the STAN user guide. With the model below, the parser gives me an error at the bernoulli assignment line (penultimate line in the model block), saying it needs integers. Note: I also tried defining HB_miss as a real in the parameters block and getting the same error.
m2 <- '
data {
int<lower=0> N; // total number of observations
int<lower=0,upper=1> y[N]; // setting the dependent variable y as binary
vector[N] X; // independent variable 1
int<lower=0> N_obs;
int<lower=0> N_miss;
int<lower=1, upper=N> ii_obs[N_obs];
int<lower=1, upper=N> ii_miss[N_miss];
vector[N_obs] HB_obs; // independent variable 2 (observed)
}
parameters {
real b_0; // intercept
real b_X; // beta 1,2, ...
real b_HB;
vector[N_miss] HB_miss;
}
transformed parameters {
vector[N] HB;
HB[ii_obs] = HB_obs;
HB[ii_miss] = HB_miss;
}
model {
b_0 ~ normal(0,100);
b_X ~ normal(0,100);
b_HB ~ normal(0,100);
HB_miss ~ bernoulli(0.5); // This is where the parser gives me an error
y ~ bernoulli_logit(b_0 + b_X * X + b_HB * HB); // model
}
Any ideas how I can assign a bernoulli prior to HB_miss effectively in STAN?