1

I am modeling a hydrologic process (water levels [stage] in lakes measured in mm) that can be described as:

eq

where eq2 is estimated from a different model and used as a constant in this model. eq2is the unknown and the value is expected to be between (-0.001,0.001). The starting value of S doesn't matter so long as it is greater than 10m (10000mm). The model runs on a daily time step. I have observed stage from multiple different lakes and fit each lake independently.

Currently, I am brute-force identifying the parameter value by:

  1. Creating a 100 value sequence of parameter values spanning (-0.001,0.001)
  2. Predicting stage using the above equation and estimate RMSE between modeled and observed data (significantly fewer observations than modeled data points)
  3. Identifying the B with the lowest RMSE and selecting B values on either side to create a new sequence of parameter values to search over
  4. Step 2 and 3 are repeated until RMSE decreases by less than 0.01 or increases.

The code for the brute force approach I've been using is below along with the data associated with a single lake.

Is there an alternative approach to estimating the unknown parameter Beta2 given the model above and the fact that I only have observed data for a limited number of days?

Thanks!

library(tidyverse)
library(lubridate)
library(Metrics)

#The Data
dat <- read_csv("https://www.dropbox.com/s/skg8wfpu9274npb/driver_data.csv?dl=1")
observeddata <-    read_csv("https://www.dropbox.com/s/bhh27g5rupoqps3/observeddata.csv?dl=1") %>% select(Date,Value)

#Setup initial values and vectors
S = matrix(nrow = nrow(dat),ncol = 1) #create an empty matrix for predicted values
S[1,1] = 10*1000 #set initial value (mm)
rmse.diff <- 10^100 #random high value for difference between min RMSE between successive
                    #parameter searches
b.levels <- seq(from = -0.001,to = 0.001,length.out=100) #random starting parameter that should contain
                                                         #the final value being estimated 
n = 0 # counter

#Loop to bruteforce search for best parameter estimate
while(rmse.diff > 0.01 ) {
  rmse.vec = rep(NA,length(b.levels))
  for(t in 1:length(b.levels)){
    for(z in 2:nrow(S)){
      S[z,1] <- S[(z-1),1] + (1.071663*(dat$X[z])) + (b.levels[t]*(S[(z-1),1])) #-1.532236
    } #end of time series loop
    extrap_level <- data.frame(Date= dat$Date, level = S) # predicted lake levels

    #calculate an offset to center observed data on extrapolated data 
    dat.offset = observeddata %>% left_join(extrap_level) %>% 
      mutate(offset = level-Value) %>% drop_na()
    offset <- mean(dat.offset$offset)
    dat.compare <- observeddata %>% left_join(extrap_level) %>% 
      mutate(Value = Value + offset) %>% drop_na()
    #calculate RMSE between observed and extrapolated values
    rmse.vec[t] <- rmse(actual = dat.compare$Value,predicted = dat.compare$level)
    #plot the data to watch how parameter choice influences fit while looping
    #plots have a hard time keeping up
    if(t ==1 | t==50 | t==100) {
    plot(extrap_level$Date,extrap_level$level,type="l")
    lines(dat.compare$Date,dat.compare$Value,col="red")
    }
  }
  #find minimum RMSE value
  min.rmse <- which(rmse.vec==min(rmse.vec))
  if(n == 0) rmse.best <- rmse.vec[min.rmse] else rmse.best = c(rmse.best,rmse.vec[min.rmse])
  if (n >= 1) rmse.diff <- (rmse.best[n]-rmse.best[n+1])
  if(rmse.diff < 0) break()
  best.b <- b.levels[min.rmse]
  #take the parameter values on either side of the best prior RMSE and use those as search area
  b.levels <- seq(from = (b.levels[min.rmse-1]),to = (b.levels[min.rmse+1]),length.out=100)
  n = n + 1
}

rmse.best #vector of RMSE for each parameter search 
best.b #Last identified best parameter value
nrlottig
  • 31
  • 1
  • 4
  • Please provide a [reproducible example in r](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The link I provided, will tell you how. Moreover, please take the [tour](https://stackoverflow.com/tour) and visit [how to ask](https://stackoverflow.com/help/how-to-ask). Cheers. – M-- Jan 18 '19 at 15:33
  • The code provided and link to two datasets are reproducible using real data. I’ll have a look – nrlottig Jan 19 '19 at 17:26
  • As it stands, it is very difficult for us to help you. Please follow [How to Ask](https://stackoverflow.com/help/how-to-ask), and then edit the question accordingly. If you are asking us to debug your code, please include a [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve); ensure you have sample inputs, outputs, and, if any, errors included. Avoid asking multiple questions, referring to external datasets and generally follow how to ask and the first link I provided. – M-- Jan 20 '19 at 04:41
  • I went through and changed the way I referenced data and emphasized the question I had. I hope this is better? – nrlottig Jan 23 '19 at 18:38

0 Answers0