I have recently encountered Gaussian process models and happen to think that they may be the solution to a problem I have been working on in my lab. I have an open and related question on Cross Validated, but I wanted to separate out my modeling/math questions from my programming questions. Hence, this second, related post. If knowing more about the background of my problem would help though here is the link my open CV question.
Here is my stan code that corresponds to the updated covariance functions presented in my CV post:
functions{
//covariance function for main portion of the model
matrix main_GP(
int Nx,
vector x,
int Ny,
vector y,
real alpha1,
real alpha2,
real alpha3,
real rho1,
real rho2,
real rho3,
real rho4,
real rho5,
real HR_f,
real R_f){
matrix[Nx, Ny] K1;
matrix[Nx, Ny] K2;
matrix[Nx, Ny] K3;
matrix[Nx, Ny] Sigma;
//specifying random Gaussian process that governs covariance matrix
for(i in 1:Nx){
for (j in 1:Ny){
K1[i,j] = alpha1*exp(-square(x[i]-y[j])/2/square(rho1));
}
}
//specifying random Gaussian process incorporates heart rate
for(i in 1:Nx){
for(j in 1:Ny){
K2[i, j] = alpha2*exp(-2*square(sin(pi()*fabs(x[i]-y[j])*HR_f))/square(rho2))*
exp(-square(x[i]-y[j])/2/square(rho3));
}
}
//specifying random Gaussian process incorporates heart rate as a function of respiration
for(i in 1:Nx){
for(j in 1:Ny){
K3[i, j] = alpha3*exp(-2*square(sin(pi()*fabs(x[i]-y[j])*HR_f))/square(rho4))*
exp(-2*square(sin(pi()*fabs(x[i]-y[j])*R_f))/square(rho5));
}
}
Sigma = K1+K2+K3;
return Sigma;
}
//function for posterior calculations
vector post_pred_rng(
real a1,
real a2,
real a3,
real r1,
real r2,
real r3,
real r4,
real r5,
real HR,
real R,
real sn,
int No,
vector xo,
int Np,
vector xp,
vector yobs){
matrix[No,No] Ko;
matrix[Np,Np] Kp;
matrix[No,Np] Kop;
matrix[Np,No] Ko_inv_t;
vector[Np] mu_p;
matrix[Np,Np] Tau;
matrix[Np,Np] L2;
vector[Np] yp;
//--------------------------------------------------------------------
//Kernel Multiple GPs for observed data
Ko = main_GP(No, xo, No, xo, a1, a2, a3, r1, r2, r3, r4, r5, HR, R);
Ko = Ko + diag_matrix(rep_vector(1, No))*sn;
//--------------------------------------------------------------------
//kernel for predicted data
Kp = main_GP(Np, xp, Np, xp, a1, a2, a3, r1, r2, r3, r4, r5, HR, R);
Kp = Kp + diag_matrix(rep_vector(1, Np))*sn;
//--------------------------------------------------------------------
//kernel for observed and predicted cross
Kop = main_GP(No, xo, Np, xp, a1, a2, a3, r1, r2, r3, r4, r5, HR, R);
//--------------------------------------------------------------------
//Algorithm 2.1 of Rassmussen and Williams...
Ko_inv_t = Kop'/Ko;
mu_p = Ko_inv_t*yobs;
Tau=Kp-Ko_inv_t*Kop;
L2 = cholesky_decompose(Tau);
yp = mu_p + L2*rep_vector(normal_rng(0,1), Np);
return yp;
}
}
data {
int<lower=1> N1;
int<lower=1> N2;
vector[N1] X;
vector[N1] Y;
vector[N2] Xp;
real<lower=0> mu_HR;
real<lower=0> mu_R;
}
transformed data {
vector[N1] mu;
for(n in 1:N1) mu[n] = 0;
}
parameters {
real loga1;
real loga2;
real loga3;
real logr1;
real logr2;
real logr3;
real logr4;
real logr5;
real<lower=.5, upper=3.5> HR;
real<lower=.05, upper=.75> R;
real logsigma_sq;
}
transformed parameters {
real a1 = exp(loga1);
real a2 = exp(loga2);
real a3 = exp(loga3);
real r1 = exp(logr1);
real r2 = exp(logr2);
real r3 = exp(logr3);
real r4 = exp(logr4);
real r5 = exp(logr5);
real sigma_sq = exp(logsigma_sq);
}
model{
matrix[N1,N1] Sigma;
matrix[N1,N1] L_S;
//using GP function from above
Sigma = main_GP(N1, X, N1, X, a1, a2, a3, r1, r2, r3, r4, r5, HR, R);
Sigma = Sigma + diag_matrix(rep_vector(1, N1))*sigma_sq;
L_S = cholesky_decompose(Sigma);
Y ~ multi_normal_cholesky(mu, L_S);
//priors for parameters
loga1 ~ student_t(3,0,1);
loga2 ~ student_t(3,0,1);
loga3 ~ student_t(3,0,1);
logr1 ~ student_t(3,0,1);
logr2 ~ student_t(3,0,1);
logr3 ~ student_t(3,0,1);
logr4 ~ student_t(3,0,1);
logr5 ~ student_t(3,0,1);
logsigma_sq ~ student_t(3,0,1);
HR ~ normal(mu_HR,.25);
R ~ normal(mu_R, .03);
}
generated quantities {
vector[N2] Ypred;
Ypred = post_pred_rng(a1, a2, a3, r1, r2, r3, r4, r5, HR, R, sigma_sq, N1, X, N2, Xp, Y);
}
I have tinkered around with the priors for the parameters included in my kernels, some parameterizations are a bit faster (up to two times faster in some instances, but can still produce relatively slow chains even in those cases).
I am trying to predict values for 3.5s worth of data (sampled at 10 Hz - so 35 data points), using data from the 15 seconds preceding and following the contaminated section (sampled at 3.33 Hz so 100 total data points).
The model as specified in R is as follows:
fit.pred2 <- stan(file = 'Fast_GP6_all.stan',
data = dat,
warmup = 1000,
iter = 1500,
refresh=5,
chains = 3,
pars= pars.to.monitor
)
I do not know if I need that many warmup iterations to be honest. I imagine part of the slow estimation is the result of fairly non-informative priors (except for heart rate and respiration HR
& R
as those have fairly well known ranges at rest in a healthy adult).
Any suggestions are more than welcome to speed up my program's run time.
Thanks.