2

This is my first question, please let me know if I'm doing anything wrong. We have a df with two variables, and want to model EPR (egg production rate) as a function of temperature.

The relevant packages as per the nls page:

install.packages("tidyverse")
install.packages("nls.multstart")
install.packages("nlstools")
library(tidyverse)
library(nls.multstart) 
library(nlstools)

The relevant variables from a larger df:

temp=c(9.2,9.9,12.7,12.8,14.3,14.5,16.3,16.5,18,18,19.6,19.6,19.9,19.9,22,22.4,23.2,23.4,25.3,25.6,27,27.3,28.5,30.3,20.9)
EPR=c(1.5,0,0,0,1.27,0.56,3.08,0.575,2.7,3.09,2,6.3,2,3.76,3.7,1.65,7.1,18.9,7.07,3.77,13.79,0,0,0.47,0)
df<-data.frame(temp,EPR)

Here I write the formula with the five parameters to be estimated (k1,a,b,k2,c), temp will be the x values. So far so good.

formula<-function(k1,a,b,k2,c,temp) {
 modelEPR<-k1*1/(1+exp(-a*(temp-b)))-k2*exp(c*temp)
 return(modelEPR)
}

This is where I'm stuck; I'm already using quite narrow start_lower and upper, since I now know the parameters by using the excel solver somewhat successfully. The values I get with this method will get me a model, albeit quite an inaccurate one. Yes, I gave the start lower and upper a much greater range in the beginning, but that didn't yield any better results.

fit <- nls_multstart(EPR ~ formula(k1,a,b,k2,c,temp),
                 data = df,
                 iter = 100,
                 start_lower = c(k1 = 14, a = 0.3, b = 20, k2 = 0.02, c = 0.15),
                 start_upper = c(k1 = 15, a = 0.5, b = 21, k2 = 0.08, c = 0.24),
                 supp_errors = 'Y',
                 na.action = na.omit)

fit

As aforementioned, I used the excel solver to successfully make the model and I got the parameter estimates, then tried to just manually insert them here in R, which makes for a much better model.

model<-df %>%
 mutate(pred=(14.69/(1+exp(-0.41*(temp-20.52)))-0.05*exp(0.19 *temp))) %>% 
 ggplot()+
 xlab("Temperature (°C)")+
 ylab("EPR (Eggs per female per day")+
 geom_point(aes(temp,EPR))+
 geom_line(aes(temp,pred),col="red")

model

Ultimately, I have two questions; a) What am I doing wrong? Or is it simply the data being weird? Seems to work better with excel?! b) How do I code the bridge between fit and model? fit will yield the 5 parameters, but how do I insert them directly into the model function? Can I utilize mutate somehow here?

Would appreciate any help!

Crustacean
  • 31
  • 4

1 Answers1

2

A. Starting values and fitting model

To get starting values:

  1. If k1 = 0 then we can rearrange the formula as follows and then use the result of fitting that linear model as a starting value for c.

    log(EPR) ~ log(k2) + c * temp
    
  2. b is a shift in temp and a is a scaling so choose b = mean(temp) and a = 1/sd(temp)

  3. We can use algorithm = "plinear" to avoid having to specify starting values for the linear parameters, i.e. for k1 and k2. When using plinear the right hand side of the formula should be a matrix such that k1 times the first column plus k2 times the second column gives the predicted EPR.

This gives the following. Note that k1 and k2 will be represented by .lin1 and .lin2 in the nls output.

fm1 <- lm(log(EPR) ~ temp, df, subset = EPR > 0)
st2 <- list(c = coef(fm1)[[2]], a = 1/sd(df$temp), b = mean(df$temp))
fo2 <- EPR ~ cbind(1/(1+exp(-a*(temp-b))), -exp(c*temp))
fm2 <- nls(fo2, df, start = st2, algorithm = "plinear", 
  control = list(maxiter = 200))
deviance(fm2) # residual sum of squares
## [1] 333.6

Note that this represents a lower (better) residual sum of squares than the fit shown in the question:

sum((df$EPR - pred)^2) # residual sum of squares for fit shown in question
## [1] 339.7

No packages were used.

We can plot the two fits where the fit from the question is in blue and the fit done here is in red. From the plot there is some question whether the two large EFR values are outliers and whether they should be excluded.

plot(EPR ~ temp, df)
lines(fitted(fm2) ~ temp, df, subset = order(temp), col = "red")
lines(pred ~ temp, df, subset = order(temp), col = "blue")

[continued after screenshot]

screenshot

B. Evaluating model at given parameters

For a given model expressed in formula notation we can evaluate it at given parameters using the nls2 package. nls2 takes similar arguments as nls but if the starting value is a data frame with one row and the algorithm is "brute" then it simply returns the value of the right hand side evaluated at the starting values. See ?nls for more information.

library(nls2)

fo <- EPR ~ k1*1/(1+exp(-a*(temp-b)))-k2*exp(c*temp)
st <- list(k1 = 14.69, a = 0.41, b = 20.52, k2 = 0.05, c = 0.19)
fm <- nls2(fo, df, start = data.frame(st), algorithm = "brute")

deviance(fm)
## [1] 339.7

fitted(fm) # predictions at parameter values given in st

or in terms of a function:

rhs <- function(a, b, c, k1, k2, temp) k1*1/(1+exp(-a*(temp-b)))-k2*exp(c*temp)
p <- do.call("rhs", c(st, list(temp = df$temp)))
all.equal(p, pred)
## [1] TRUE
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341