I want to graph the histograms of parameter estimates from a stan model against the priors for those parameters. I have tried doing this by running a model in stan, graphing it with ggplot2, then overlaying an approximation of the prior distribution using R's random generator function (e.g. rnorm()
, rbinom()
) but I have run into many scaling issues that make the graphs impossible to get looking right.
I was thinking a better way to do it would be simply to sample directly from the prior distribution and then graph those samples against the parameter estimates, but running a whole separate model just to sample from the priors seems very time-consuming. I was wondering if there was a way to do this within, or rather parallel-to, an existing model.
Here is a sample script.
# simulate linear model
a <- 3 # intercept
b <- 2 # slope
# data
x <- rnorm(28, 0, 1)
eps <- rnorm(28, 0, 2)
y <- a + b*x + eps
# put data into list
data_reg <- list(N = 28, x = x, y = y)
# create the model string
ms <- "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
vector[N] mu;
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
for ( i in 1:N ) {
mu[i] = alpha + beta * x[i];
}
y ~ normal(mu, sigma);
}
"
# now fit the model in stan
fit1 <- stan(model_code = ms, # model string
data = data_reg, # named list of data
chains = 1, # number of Markov chains
warmup = 1e3, # number of warmup iterations per chain
iter = 2e3) # show progress every 'refresh' iterations
# extract the sample estimates
post <- extract(fit1, pars = c("alpha", "beta", "sigma"))
# now for the density plots. Write a plotting function
densFunct <- function (parName) {
g <- ggplot(postDF, aes_string(x = parName)) +
geom_histogram(aes(y=..density..), fill = "white", colour = "black", bins = 50) +
geom_density(fill = "skyblue", alpha = 0.3)
return(g)
}
# plot
gridExtra::grid.arrange(grobs = lapply(names(postDF), function (i) densFunct(i)), ncol = 1)
Now I understand that I can sample from the prior by simply omitting the likelihood from the model string, like so
ms <- "
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
sigma ~ cauchy(0, 2);
beta ~ normal(0,10);
alpha ~ normal(0,100);
}
"
But is there any way to get the samples from the prior within the first model? Maybe via the generated quantities block?