3

I want to fit the following Generalized Nonlinear Model: Probit(G)=K+1/Sigma*(Temp-T0)*Time. As naive model, I created Probits(G) by qnorm(G) and then fitted the Nonlinear Model. But I want to fit Nonlinear Model with logit link similar to glm function in R.

How can I fit such Generalized Nonlinear Model with logit link in R?

Data <-
  structure(list(Temp = c(23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L,
  27L, 27L, 27L, 27L, 27L, 27L, 33L, 33L, 33L, 33L, 33L, 33L, 33L,
  35L, 35L, 35L, 35L, 35L), Time = c(144L, 168L, 192L, 216L, 240L,
  264L, 288L, 312L, 120L, 144L, 168L, 192L, 216L, 240L, 72L, 96L,
  120L, 144L, 168L, 192L, 216L, 96L, 120L, 144L, 168L, 192L), G = c(15,
  25.5, 27, 28, 28.5, 39.5, 41.5, 43, 13, 21.5, 29.5, 30.5, 32.5,
  35, 13.5, 28, 32.5, 33.5, 35, 39.5, 42, 6.5, 30, 39.5, 57, 58.5
  )), .Names = c("Temp", "Time", "G"), class = "data.frame", row.names = c(NA,
  -26L))

Data$GermRate <- 1/Data$Time
Data$Probits <- qnorm(p=Data$G/100) # Get Probits


fm1 <-
  nls(
      formula= Probits ~ K+1/Sigma*(Temp-T0)*Time
    , data=Data
    , start=list(K=1, Sigma=2, T0=2)
    #, algorithm= "port"
    )
fm1Summary <- summary(fm1)
fm1Coef <- summary(fm1)$coef
halfer
  • 19,824
  • 17
  • 99
  • 186
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • 1
    If you want a logit link, your responses should be between 0-1 but your `Probits` values contain negative values. `nls` doesn't have a link function property, you just need to include the transformation in the model formula itself (assuming you use a transformation that makes sense). Or maybe i'm misunderstanding what you mean by wanting to use a link function here. Can you more explicitly specify the model you want to fit in your question? – MrFlick Oct 13 '14 at 16:28
  • Thanks @MrFlick for showing your interest in my problem. Actually the response variable is `Data$G/100` from which I created `Probits`. As a student of Statistics, I know that such `transformation` can be avoided by using `linking`. – MYaseen208 Oct 13 '14 at 16:33
  • Linking only works if the fitting algorithm allows for it. `nls` (nonlinear lesst squares) does not usually allow for this. Linking is usually associated with `glm`. And the response of your model in R is whatever is on the left of the `~` in your formula. Are you trying to do a `probit` or a `logit` link? It sounds like you may need more statistical advice rather than programming advice. If so, you may consider posting to [stats.se] instead. – MrFlick Oct 13 '14 at 16:41
  • 2
    you should be able to do the equivalent of linking by applying the inverse-link function on the RHS, something like `GermRate ~ pnorm(K+1/Sigma*(Temp-T0)*Time)` -- a first pass at this didn't work, but it's the direction I would go – Ben Bolker Oct 13 '14 at 16:58
  • I think you should wrap the RHS of your formula in `I`, like `Probits ~ I(K+1/Sigma*(Temp-T0)*Time)`. Otherwise it's going to try and parse it with formula syntax (interpereting `*` as an interaction) rather than with normal math syntax. – shadowtalker Oct 15 '14 at 16:31
  • Also you should post what exactly is going wrong. Are you getting an error? Do your results make no sense? – shadowtalker Oct 15 '14 at 16:32

1 Answers1

9

You can fit this type of model using the gnm package for generalized nonlinear models. It takes a bit of work, as gnm uses pre-defined functions of class "nonlin" to specify nonlinear terms in the model and the ones provided by the package are generally insufficient to specify an arbitrary nonlinear function. However it is possible to define a custom "nonlin" function to use with gnm.

In your model, k is a linear parameter, so we only need to worry about the second term. This can be specified via the following "nonlin"function

customNonlin <- function(Temp, Time){
    list(predictors = list(sigma = 1, t0 = 1),
         variables = list(substitute(Temp), substitute(Time)),
         term = function(predLabels, varLabels) {
             sprintf("1/%s * (%s - %s) * %s",
                     predLabels[1], varLabels[1], 
                     predLabels[2], varLabels[2])
         })
}
class(customNonlin) <- "nonlin"

In the returned list,

  • predictors specifies that sigma and t0 are predictors with a single intercept term (i.e. are individual parameters).
  • variables specifies that there are two variables, to be supplied by the user via the Temp and Time arguments.
  • term specifies a function to create a deparsed mathematical expression of the term, given labels for the predictors and the variables.

More detail on "nonlin" functions can be found in Section 3.5 of the gnm vignette.

Now we can try fitting your model as follows

mod1 <- gnm(cbind(G, 100 - G) ~ customNonlin(Temp, Time), family = binomial, 
            data  = Data, start = c(1, 2, 2))

Note that, as in glm, an intercept is added to the formula by default, which here will represent k. Although the starting values are far from the solution, the gnm convergence criteria are met at this point, so the algorithm does not perform any iterations. In this case a better starting estimate of sigma is required for gnm to converge to something more sensible

mod2 <- gnm(cbind(G, 100 - G) ~ customNonlin(Temp, Time), family = binomial, 
            data  = Data, start = c(1, 2000, 2))

mod2

Call:
gnm(formula = cbind(G, 100 - G) ~ customNonlin(Temp, Time), family = binomial, 
    data = Data, start = c(1, 2000, 2))

Coefficients:
(Intercept)        sigma           t0  
     -2.589     1915.602        8.815  

Deviance:            53.53157 
Pearson chi-squared: 49.91347 
Residual df:         23 

Actually it is possible to specify this model using the Mult function provided by gnm, as long as you don't mind re-parameterising the model:

mod3 <- gnm(cbind(G, 100 - G) ~ Mult(1, 1 + offset(Temp), offset(Time)), 
            family = binomial, data  = Data,
            start = c(1, 1/2000, -2))

mod3

Call:

gnm(formula = cbind(G, 100 - G) ~ Mult(1, offset(Temp) + 1, offset(Time)), 
    family = binomial, data = Data, start = c(1, 1/2000, -2))

Coefficients:
                             (Intercept)  
                               -2.588874  
Mult(., 1 + offset(Temp), offset(Time)).  
                                0.000522  
Mult(1, . + offset(Temp), offset(Time)).  
                               -8.815152  

Deviance:            53.53157 
Pearson chi-squared: 49.91347 
Residual df:         23 

EDIT

The function of the parameters is specified in the term component of the list returned by customNonlin, which you can see via

customNonlin(Temp, Time)$term(c("sigma", "t0"), c("Temp", "Time"))
"1/sigma * (Temp - t0) * Time"

So if you just want to change the functional form, you will need to modify the term function. If you want to add/remove parameters, you would also need to modify the list in the predictors component. Similarly if the new term requires you to add/remove variables, you would modify the variables component.

Heather Turner
  • 3,264
  • 23
  • 30
  • Thanks @HeatherTurner for your answer. I wonder where we are specifying the model terms. Wonder if the formula will change, then how the function `customNonlin` will change? – MYaseen208 Oct 16 '14 at 16:04
  • I've added some comments on this. – Heather Turner Oct 16 '14 at 16:22
  • Can you put more light on `"1/%s * (%s - %s) * %s"` and explain more? Thanks – MYaseen208 Oct 16 '14 at 16:32
  • I've used `sprintf` to create the string, each `%s` is substituted by the values given in the call to `sprintf`. See `?sprintf` for more details. You don't have to use `sprintf`, for example you could use `paste` to create the string. – Heather Turner Oct 16 '14 at 16:45
  • How would one remove the intercept using a custom `nonlin` function? – jbaums Jan 13 '15 at 05:59
  • 1
    The call to the `nonlin` function is just treated as a term in the model - so you can add to the formula as in `glm`. Specifically to remove the intercept you can add `0` or `-1` to the model formula. – Heather Turner Jan 13 '15 at 10:43
  • Please, could you please explain when to use gnm and when glm or nls? Let say I want to fit a logistic model. – skan May 13 '17 at 20:12
  • 1
    glm is for modelling a response with a distribution from the exponential family (e.g. Gaussian, Binomial, Poisson) and linking the expected value to a linear predictor (where each term is of the form coef * variable). nls is for modelling a continuous, unbounded response with a nonlinear predictor (terms are nonlinear functions of parameters). gnm combines the two, i.e. is like a glm but allows nonlinear terms in the predictor. A logistic regression models a Binomial response with a logit link - so use glm or gnm depending on the predictor, see also Sec 2 of gnm vignette for glm enhancements. – Heather Turner May 14 '17 at 13:21