0

I am currently working on a time series streamflow data range from 2002-10-04 to 2012-10-28, what I need to do is to develop regression models using 10-day time window data. To be more specific, I need to use Oct-04 to Oct 13 across all years from 2002 to 2012 to build a regression model. Then I need to use Oct-05 to Oct 14 across all years from 2002 to 2012 to build another regression model, then use Oct-06 to Oct 15 across all years to build the next model and so on repeatedly till the end. I already finished this part and do have all the coefficients from each regression. What I need to do next is to use MLE to optimize each set of coefficients, so this is the part that I can't handle. I only worked on some simple MLE with regression before but never worked on datasets like this with hierarchical regression.

I tried nested for() loop to rolling apply negative log likelihood function and mle2() function. Below is my code, and the data are extracted from package EGRET

# extract data from package
haw <- readNWISDaily("02096960", "00060", "2002-10-01", "2012-10-25") %>% rename(Qhaw = Q)
deep <- readNWISDaily("02102000", "00060", "2002-10-02", "2012-10-26") %>% rename(Qdp = Q)
CFLlt <- readNWISDaily("02102500", "00060", "2002-10-02", "2012-10-26") %>% rename(Qllt = Q)
little <- readNWISDaily("02103000", "00060", "2002-10-02", "2012-10-26") %>% rename(Qlit = Q)
kelly <- readNWISDaily("02105769", "00060", "2002-10-04", "2012-10-28") %>% rename(Qkl = Q)
# build full dataframe
CFbasin <- data.frame(Data = kelly$Date, Qkl = kelly$Qkl, Qllt = CFLlt$Qllt, Qhaw = haw$Qhaw, Qdp = deep$Qdp, Qlit = little$Qlit)
# prepare the data for mle
CFbasin2 <- CFbasin[, c(1,2,4,5,6)]
# upper_stream llt ~ haw + dp
CFbasin_up <- CFbasin[, c(1, 3:5)]
# down_stream kl ~ llt + lit
CFbasin_dw <- CFbasin[, c(1:3, 6)] 
# rolling grab the coefficient of OLS regression for down stream and up stream
## upper stream
yday_up <- as.POSIXlt(CFbasin_up$Data)$yday
coefs_up <- function(ix) {
  x <- CFbasin_up[yday_up %in% ix, -1]
  if (NROW(x) == 0) NA else
    coef(lm(Qllt ~ Qhaw + Qdp, CFbasin_up[-1], subset = yday_up %in% ix))
}
up_coef <- rollapplyr(1:365, 5, coefs_up, fill = NA) %>% .[-c(1:4), ]
## down stream
yday_dw <- as.POSIXlt(CFbasin_dw$Data)$yday
coefs_dw <- function(ix) {
  x <- CFbasin_dw[yday_dw %in% ix, -1]
  if (NROW(x) == 0) NA else
    coef(lm(Qkl ~ Qllt + Qlit, CFbasin_dw[-1], subset = yday_dw %in% ix))
}
dw_coef <- rollapplyr(0:364, 5, coefs_dw, fill = NA) %>% .[-c(1:4), ]

colnames(dw_coef) <- c("b0", "b1", "b2")
colnames(up_coef) <- c("r0", "r1", "r2")
CFbasin_coef <- cbind(dw_coef, up_coef)
## negative log likelihood function

results <- list()
for(i in 1:nrow(CFbasin_coef)){
  for (k in 1:nrow(CFbasin2)){
    regress.ll <- function(b0, b1, r0, r1, r2, b2, x = CFbasin2[k:(k+9),]) {
      x1 = x$Qhaw
      x2 = x$Qdp
      x3 = x$Qlit
      y = x$Qkl
      y.hat = b0 + b1*(r0 + r1*x1 + r2*x2) + b2*x3
      errors = y - y.hat
      mu_error = mean(errors)
      sd_errors = sd(errors)
      neg.loklik = -sum(dnorm(errors, mean = mu_error, sd = sd_errors, log = T))
      return(neg.loklik)
    }
  }
  results[i, ] <- mle2(minuslogl = regress.ll,
                       start = list(b0 = CFbasin_coef[i,1], b1 = CFbasin_coef[i,2], r0 = CFbasin_coef[i,4], 
                                    r1 = CFbasin_coef[i,5], r2 = CFbasin_coef[i,6], b2 = CFbasin_coef[i,3]))
  print(results)
}

But finally I got an error

Error in optim(par = c(b0 = -22.7154888887627, b1 = 0.979859573521246,  : 
  initial value in 'vmmin' is not finite

Any suggestion and advice will be really appreciated

Rua Jing
  • 65
  • 5
  • Seems like this may address your issue: [MLE error in R: initial value in 'vmmin' is not finite](https://stackoverflow.com/questions/24383746/mle-error-in-r-initial-value-in-vmmin-is-not-finite) – Mako212 Nov 18 '21 at 04:02

1 Answers1

0

First, it's unobvious what you're trying to accomplish here:

results[i, ] <- mle2(minuslogl = regress.ll,
                       start = list(b0 = CFbasin_coef[i,1], b1 = CFbasin_coef[i,2], r0 = CFbasin_coef[i,4], 
                                    r1 = CFbasin_coef[i,5], r2 = CFbasin_coef[i,6], b2 = CFbasin_coef[i,3]))

print(results)

You're literally printing the results every i-loop cycle.

Second, you're receiving your error because you k-cycle is out of bounds; however, it's hard to understand what it was supposed to do in the first place:

for (k in 1:nrow(CFbasin2)){
  regress.ll <- function(b0, b1, r0, r1, r2, b2, x = CFbasin2[k:(k+9),]) {
    x1 = x$Qhaw
    x2 = x$Qdp
    x3 = x$Qlit
    y = x$Qkl
    y.hat = b0 + b1*(r0 + r1*x1 + r2*x2) + b2*x3
    errors = y - y.hat
    mu_error = mean(errors)
    sd_errors = sd(errors)
    neg.loklik = -sum(dnorm(errors, mean = mu_error, sd = sd_errors, log = T))
    return(neg.loklik)
  }
}

You're literally re-defining regress.ll function 3678 times, and in the end it will work from k=3678. Because of this, at the final step you'll be having neg.loklik as NA (which is the value you're returning), and this is the exact cause of your issue:

Browse[2]> neg.loklik
[1] NA
Browse[2]> errors
 [1] 5.358041       NA       NA       NA       NA       NA       NA       NA       NA       NA

That's because your CFbasin2[k:(k+9),] is out of bounds.

It's hard for me to tell you the exact proper code that should work for you, since it's not obvious for me what exactly the algorithm of your calculations is.

Maksim I. Kuzmin
  • 1,170
  • 7
  • 16