I am currently working on a time series streamflow data range from 2002-10-04 to 2012-10-28, what I need to do is to develop regression models using 10-day time window data. To be more specific, I need to use Oct-04 to Oct 13 across all years from 2002 to 2012 to build a regression model. Then I need to use Oct-05 to Oct 14 across all years from 2002 to 2012 to build another regression model, then use Oct-06 to Oct 15 across all years to build the next model and so on repeatedly till the end. I already finished this part and do have all the coefficients from each regression. What I need to do next is to use MLE to optimize each set of coefficients, so this is the part that I can't handle. I only worked on some simple MLE with regression before but never worked on datasets like this with hierarchical regression.
I tried nested for()
loop to rolling apply negative log likelihood function and mle2()
function. Below is my code, and the data are extracted from package EGRET
# extract data from package
haw <- readNWISDaily("02096960", "00060", "2002-10-01", "2012-10-25") %>% rename(Qhaw = Q)
deep <- readNWISDaily("02102000", "00060", "2002-10-02", "2012-10-26") %>% rename(Qdp = Q)
CFLlt <- readNWISDaily("02102500", "00060", "2002-10-02", "2012-10-26") %>% rename(Qllt = Q)
little <- readNWISDaily("02103000", "00060", "2002-10-02", "2012-10-26") %>% rename(Qlit = Q)
kelly <- readNWISDaily("02105769", "00060", "2002-10-04", "2012-10-28") %>% rename(Qkl = Q)
# build full dataframe
CFbasin <- data.frame(Data = kelly$Date, Qkl = kelly$Qkl, Qllt = CFLlt$Qllt, Qhaw = haw$Qhaw, Qdp = deep$Qdp, Qlit = little$Qlit)
# prepare the data for mle
CFbasin2 <- CFbasin[, c(1,2,4,5,6)]
# upper_stream llt ~ haw + dp
CFbasin_up <- CFbasin[, c(1, 3:5)]
# down_stream kl ~ llt + lit
CFbasin_dw <- CFbasin[, c(1:3, 6)]
# rolling grab the coefficient of OLS regression for down stream and up stream
## upper stream
yday_up <- as.POSIXlt(CFbasin_up$Data)$yday
coefs_up <- function(ix) {
x <- CFbasin_up[yday_up %in% ix, -1]
if (NROW(x) == 0) NA else
coef(lm(Qllt ~ Qhaw + Qdp, CFbasin_up[-1], subset = yday_up %in% ix))
}
up_coef <- rollapplyr(1:365, 5, coefs_up, fill = NA) %>% .[-c(1:4), ]
## down stream
yday_dw <- as.POSIXlt(CFbasin_dw$Data)$yday
coefs_dw <- function(ix) {
x <- CFbasin_dw[yday_dw %in% ix, -1]
if (NROW(x) == 0) NA else
coef(lm(Qkl ~ Qllt + Qlit, CFbasin_dw[-1], subset = yday_dw %in% ix))
}
dw_coef <- rollapplyr(0:364, 5, coefs_dw, fill = NA) %>% .[-c(1:4), ]
colnames(dw_coef) <- c("b0", "b1", "b2")
colnames(up_coef) <- c("r0", "r1", "r2")
CFbasin_coef <- cbind(dw_coef, up_coef)
## negative log likelihood function
results <- list()
for(i in 1:nrow(CFbasin_coef)){
for (k in 1:nrow(CFbasin2)){
regress.ll <- function(b0, b1, r0, r1, r2, b2, x = CFbasin2[k:(k+9),]) {
x1 = x$Qhaw
x2 = x$Qdp
x3 = x$Qlit
y = x$Qkl
y.hat = b0 + b1*(r0 + r1*x1 + r2*x2) + b2*x3
errors = y - y.hat
mu_error = mean(errors)
sd_errors = sd(errors)
neg.loklik = -sum(dnorm(errors, mean = mu_error, sd = sd_errors, log = T))
return(neg.loklik)
}
}
results[i, ] <- mle2(minuslogl = regress.ll,
start = list(b0 = CFbasin_coef[i,1], b1 = CFbasin_coef[i,2], r0 = CFbasin_coef[i,4],
r1 = CFbasin_coef[i,5], r2 = CFbasin_coef[i,6], b2 = CFbasin_coef[i,3]))
print(results)
}
But finally I got an error
Error in optim(par = c(b0 = -22.7154888887627, b1 = 0.979859573521246, :
initial value in 'vmmin' is not finite
Any suggestion and advice will be really appreciated