speed up optim for a double for-loop function in R

Question

I'm struggling to speed up the optimization of a double for-loop function. I've already seen this post and others, but couldn't apply successfully the vectorization as proposed by Marek. Any suggestion / correction / advice to improve and hasten my code are welcome.

The aim of the function is to identify key physiological parameters (TT_TIn, b, g) linking meteorological data (TTnl:growing degree days, dDLv: daily variation of daylength) to crop yield (PS_TN).

I have to do this for many cultivars, each having a large dataset. Here is the code for data generation and the function definition.

# Data generation
lev_moy<-runif(300,80,180) # emergence date
rec<-runif(300,190,280)    # harvest date
Site<-rep(c("Cot", "Gla"),150)
Year<-rep(c("2007", "2008", "2009"),100)
meteo<-data.frame("Year"=rep(c(rep(2007, 365), rep(2008, 365),rep(2009, 365)),2),
                  "Site"=c(rep("Cot",365*3), rep("Gla",365*3)),
                  "dDLv"=rep(c(seq(3,23,0.125),seq(23,-2.25,-0.125),-1.75),6),
                  "TTnl"=runif(6*365,11.5,14.5))
PS_TN_pl<-jitter(0.00087*900*((rec*13)-900), factor = 2) # yield

# Function
rmse <- function(x) {
  TT_TIn<-x[1] # minimum growing degree day to finish the vegetative phase
  b<-x[2]      # growth parameter
  g<-x[3]      # critical daily variation in daylength 
               # (allowing the end of vegatative phase)
  TT_TI<<-numeric(length(lev_moy))
  PS_TN_est<<-numeric(length(lev_moy))
  for (i in 1:length(Site)) { 
    # for each plant (row) I select the meteo data corresponding to 
    # the specific year and site
    TTv<-c(subset(meteo, Year==Year[i]&Site==Site[i])$TTnl)
    dDLv<-c(subset(meteo, Year==Year[i]&Site==Site[i])$dDLv)
    for (j in lev_moy[i]:length(dDLv)) {
      # for each plant (row) I sum the temperature (TTnl) corresponding to the
      # vegetative phase. This period is detremined by a minimum growing degree 
      # day (TT_TIn) that I'll like to estimate with the optim function      
      if (sum(TTv[lev_moy[i]:j])>TT_TIn & dDLv[j]<g ) {
        TT_TI[i]<-ifelse(j>rec[i],sum(TTv[lev_moy[i]:rec[i]]),
                         sum(TTv[lev_moy[i]:j]))
        break      }    }  }
  PS_TN_est<-b*TT_TI*(rec-TT_TI)
  error<-PS_TN_est-PS_TN_pl
  rmse<-sqrt(mean(error^2))
  return(rmse)
}

# Optimisation
optFl<-optim(c(915,0.00057,1,0.05),rmse)

Thanks, I've edited my question. If the actual description is not clear, just let me know, I'll detail moreover. — denisC, Jul 22 '14 at 13:26
Those variable names might help you but they hurt the readability of your code to those not familiar with what those represent. Consider changing all the variable names before posting on here. What does rmse() take? Also it's best not to name the any variable the same as your function name — hedgedandlevered, Jul 22 '14 at 14:12
Avoid writing to parent environment ( `<<-`) unless absolutely necessary. You're calculating (inside your function) stuff based on variables not passed to the function; don't do that. Further, you're repeating calculations which don't change with the input `x`, so do them first and pass the results as constants. — Carl Witthoft, Jul 22 '14 at 14:36

speed up optim for a double for-loop function in R

0 Answers0