18

I am using GAM to model time trends in a logistic regression. Yet I would like to extract the the fitted spline from it to add it to another model, that cannot be fitted in GAM or GAMM.

Thus I have 2 questions:

  1. How can I fit a smoother over time so that I force one knot to be at a particular location while letting the model to find the other knots?

  2. How can I extract the matrix from the fitted GAM so that I can use it in as an impute for a different model?

The types of models I am running are to the following form:

gam <- gam(mortality.under.2~ maternal_age_c+ I(maternal_age_c^2)+
           s(birth_year,by=wealth2) + wealth2 + sex +
           residence + maternal_educ + birth_order,
           data=colombia2, family="binomial")

I've read the extensive documentation for the GAM but I am not sure still. Any suggestion is really appreciated.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
Tom
  • 305
  • 2
  • 4
  • 15
  • It's not so easy to "extract the splines"., although I would be happy to be proved wrong. For purpose 2) you could use `predict` on a grid." I use package::rms because it lets you do all those operations. – IRTFM Mar 23 '13 at 09:23
  • thanks but how would you do that using rms? – Tom Mar 23 '13 at 10:53
  • Short-circuiting quite a bit of preparatory work and making some guesses about variable structure: `fit <- lrm(mortality.under.2 ~ rcs(maternal_age_c, 3) + rcs(birth_year, 3) %ia% rcs(wealth2, 3) + sex + residence + maternal_educ + birth_order, data=colombia2)); Function(fit)` – IRTFM Mar 23 '13 at 17:49
  • `lrm(formula = mortality.under.2 ~ rcs(birth_year, 8) + rcs(maternal_age, 3) + +wealth2 + sex + residence + maternal_educ + birth_order, data = colombia2)` does work, but `specs(gam.2)`only give me the knots locations, to the polynomial in each interval. – Tom Mar 25 '13 at 04:34
  • You can specify the knots location or you can use the `Function()` result to see what the best fit is. It's probably a bit more complicated than just running the model. I do not understand why you would think that `specs()` would work with an rms model. Maybe I should not have offered a tangential alternative in the first place. – IRTFM Mar 25 '13 at 06:35

1 Answers1

41

In mgcv::gam there is a way to do this (your Q2), via the predict.gam method and type = "lpmatrix".

?predict.gam even has an example, which I reproduce below:

 library(mgcv)
 n <- 200
 sig <- 2
 dat <- gamSim(1,n=n,scale=sig)

 b <- gam(y ~ s(x0) + s(I(x1^2)) + s(x2) + offset(x3), data = dat)

 newd <- data.frame(x0=(0:30)/30, x1=(0:30)/30, x2=(0:30)/30, x3=(0:30)/30)

 Xp <- predict(b, newd, type="lpmatrix")

 ##################################################################
 ## The following shows how to use use an "lpmatrix" as a lookup 
 ## table for approximate prediction. The idea is to create 
 ## approximate prediction matrix rows by appropriate linear 
 ## interpolation of an existing prediction matrix. The additivity 
 ## of a GAM makes this possible. 
 ## There is no reason to ever do this in R, but the following 
 ## code provides a useful template for predicting from a fitted 
 ## gam *outside* R: all that is needed is the coefficient vector 
 ## and the prediction matrix. Use larger `Xp'/ smaller `dx' and/or 
 ## higher order interpolation for higher accuracy.  
 ###################################################################

 xn <- c(.341,.122,.476,.981) ## want prediction at these values
 x0 <- 1         ## intercept column
 dx <- 1/30      ## covariate spacing in `newd'
 for (j in 0:2) { ## loop through smooth terms
   cols <- 1+j*9 +1:9      ## relevant cols of Xp
   i <- floor(xn[j+1]*30)  ## find relevant rows of Xp
   w1 <- (xn[j+1]-i*dx)/dx ## interpolation weights
   ## find approx. predict matrix row portion, by interpolation
   x0 <- c(x0,Xp[i+2,cols]*w1 + Xp[i+1,cols]*(1-w1))
 }
 dim(x0)<-c(1,28) 
 fv <- x0%*%coef(b) + xn[4];fv    ## evaluate and add offset
 se <- sqrt(x0%*%b$Vp%*%t(x0));se ## get standard error
 ## compare to normal prediction
 predict(b,newdata=data.frame(x0=xn[1],x1=xn[2],
         x2=xn[3],x3=xn[4]),se=TRUE)

That goes through the entire process even the prediction step which would be done outside R or of the GAM model. You are going to have to modify the example a bit to do what you want as the example evaluates all terms in the model and you have two other terms besides the spline - essentially you do the same thing, but only for the spline terms, which involves finding the relevant columns and rows of the Xp matrix for the spline. Then also you should note that the spline is centred so you may or may not want to undo that too.

For your Q1, choose appropriate values for the xn vector/matrix in the example. These correspond to values for the nth term in the model. So set the ones you want fixed to some mean value and then vary the one associated with the spline.

If you are doing all of this in R, it would be easier to just evaluate the spline at the values of the spline covariate that you have data for that is going into the other model. You do that by creating a data frame of values at which to predict at, then use

predict(mod, newdata = newdat, type = "terms")

where mod is the fitted GAM model (via mgcv::gam), newdat is the data frame containing a column for each variable in the model (including the parametric terms; set the terms you don't want to vary to some constant mean value [say the average of the variable in the data set] or certain level if a factor). The type = "terms" part will return a matrix for each row in newdat with the "contribution" to the fitted value for each term in the model, including the spline term. Just take the column of this matrix that corresponds to the spline - again it is centered.

Perhaps I misunderstood your Q1. If you want to control the knots, see the knots argument to mgcv::gam. By default, mgcv::gam places a knot at the extremes of the data and then the remaining "knots" are spread evenly over the interval. mgcv::gam doesn't find the knots - it places them for you and you can control where it places them via the knots argument.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 3
    That's a very helpful answer. Since I cannot easily donate extra points, I'm going to see if I can find some of your earleir answers to upvote. Shouldn't be too hard. You are an excellent teacher with a deep knowledge base, Gavin. – IRTFM Mar 23 '13 at 17:52
  • That's a really great explanation. My question was indeed not clear. I want to do a mixture of procedures. I would like to place one or two knots not at a particular location **and** let the program place the remaning knots whatever is needed; It is possible? Thanks – Tom Mar 23 '13 at 20:54
  • @AntonioPedroRamos Like I said, the only thing `mgcv::gam` does is place the knots at the end points and evenly in between. You'll need to position all the knots yourself if you want to choose a few of the knots locations. IIRC these penalised regression models aren't very sensitive to the knot location. – Gavin Simpson Mar 23 '13 at 22:08