1

Original question:

Calling predict.gam(..., type="terms") returns values that are centered on the average. Is there a way to obtain the raw predicted term values (i.e. ones that have not been centered on the average)?

Edited: Here is a reproducible example of my attempt to get the (non-centered) fitted values of a given variable using lpmatrix. The values are similar to those using visreg but with an offset. This is strictly for the case where the link is identity and there are no tensor products.

    # read in data
    air<-data.frame(airquality)
    air<-air[complete.cases(air),]

    # set up m odel
    model<-gam(Temp~s(Ozone) + s(Solar.R) + s(Wind),data=air,method="ML")

#get predicted values 
predicted<-as.data.frame(predict(model,na.action=na.exclude))

    colnames(predicted)<-"predicted"

# using the lpmatrix, set values of s(Ozone), s(Solar.R), and s(Wind) to 0    
lpmat<-predict(model, type="lpmatrix")
    lpmat_Ozone<-lpmat; lpmat_Ozone[,grep("Ozone",colnames(lpmat))]<-0
    lpmat_Solar.R<-lpmat; lpmat_Solar.R[,grep("Solar.R",colnames(lpmat))]<-0
    lpmat_Wind<-lpmat; lpmat_Wind[,grep("Wind",colnames(lpmat))]<-0

#obtain response predictions with s(each variable) set to 0 (respectively)
    predicted$Ozone<-unname(lpmat_Ozone%*%coef(model))[,1]
    predicted$Solar.R<-unname(lpmat_Solar.R%*%coef(model))[,1]
    predicted$Wind<-unname(lpmat_Wind%*%coef(model))[,1]

#obtain term predictions
    answerdf<-as.data.frame(predicted$predicted - predicted$Ozone)
    colnames(answerdf)<-"Ozone"
    answerdf$Solar.R<-(predicted$predicted - predicted$Solar.R)
    answerdf$Wind<-(predicted$predicted - predicted$Wind)

#visualize using visreg method and the alternative method above 
    visregdat<-visreg(model, "Ozone", plot=FALSE)$fit
    plot(visregFit~Ozone,data=visregdat, type="l", lwd=5, ylim=c(-30,90), ylab= "fitted values")
    points(answerdf$Ozone~air$Ozone, col="violet", pch=20)
    legend(100,60, legend=c("Visreg", "Alt. method"),
           col=c("black", "violet"), pch=20, cex=0.8)

Gives us this plot, showing the same curves but with with different intercepts. Why would this be? enter image description here

webbe
  • 130
  • 8

1 Answers1

2

No. The constant to add is available as an attribute on the object returned by predict(), but otherwise, no, there is no option to do this automatically.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thank you. Adding the constant does indeed make the terms sum to the value given by predict(...,type="response"), but I am interested in the term values themselves (not their sum), which are only given centered. Is there a way to back-calculate the non-centered values? – webbe Feb 11 '20 at 13:37
  • The smooths are centred for good reason (identifiability constraints). You isolate the overall mean of the response in the model intercept, hence all smooths are centred. If you don't do this it would be difficult to fit the model. There are other ways to add identifiability constraints but not in mgcv. You may be able to suppress this but it's going to make fitting tricky. WHy do you want to do this; what's the end use case? – Gavin Simpson Feb 11 '20 at 20:54
  • 1
    My data is spatial. I am trying to isolate the effect of a given variable on the response at each observation. In a linear model, say y = a * x1 + b * x2 + c, I would say the effect of x1 at observation i is a * x1i, (i.e. the x1 coefficient * observation of x1 at point i). I am looking for a way to do this same thing in a GAM. – webbe Feb 11 '20 at 21:40
  • The usual way you do this, especially as it's the only sane way to do it when using a link function other than the identify link, is to predict from the model with `type='link'` using `newdata` that varies the variable(s) of interest and holds all others at some fixed, representative values, say their mean or median. – Gavin Simpson Feb 11 '20 at 21:47
  • You can also just read it off the plot of the smooth but the y-axis won't be on the natural scale of the response, especially if using a non-identity link. If you want it on the response scale, then predict as above (previous comment) and then transform to the response scale using the inverse link. `vis.gam()` will help you with this. – Gavin Simpson Feb 11 '20 at 21:51
  • If y = s(x1) + s(x2) + s(x3), then in principle, the predicted value of s(x1) = predict.gam(...,type="response) - predict.gam(..., type=response, newdata= newdata) where newdata is the original data with all values of x1 set to 0. This is similar to your suggestion. It seems to work, except the sum of the predicted s(x1), s(x2), s(x3), and model intercept should be equal to the predicted value of y (i.e. predict.gam(...type="response"). In practice, a constant plus the predicted s(x1), s(x2), s(x3) sum to equal the predicted value of y, but the constant is not equal to the intercept – webbe Feb 12 '20 at 13:25
  • This would only be true for `type = 'link'` unless you used a family with the `identity` link. I suggest setting the covariates you aren't interested in to some representative values because setting them to 0 doesn't work once you have models with tensor products of covariates you are and aren't interested in for example. As a general principle, you want to i) set other covariates to representative values, ii) predict over a grid of values for the covariate(s) you *are* interested in, iii) predict on the link scale & get std errors there too, iv) transform prediction plus CI to response scale – Gavin Simpson Feb 12 '20 at 16:07
  • I am using an `identiy` link without tensor products. Setting the values of x1 in newdata to 0 actually does not mean that s(x1) is zero. Instead,one can set s(x1) to 0 using the `lpmatrix` : `lpmat<-predict(model, type="lpmatrix") ; lpmat_x1<-lpmat; lpmat_x1[,grep("x1",colnames(lpmat))]<-0 ; predicted$x1<-unname(lpmat_x1%*%coef(model))[,1]` Then, the predicted value of x1 is `predict(model)-predicted$x1`. In this case it doesn't matter what the values of the other covariates are because they are subtracted away. – webbe Feb 13 '20 at 16:56
  • Plotting these values gives the same curve for x1 as `visreg(model, "x1")` with the added benefit of being able to extract s(x1) for each observed value of x1. The only thing is the values from my method are off from those from `visreg(model, "x1",plot=FALSE)$fit$visregFit` by a constant. In one case this constant = `median(x1)`, but not in all cases. I.e. the curves have the same shape, but the visreg curve is lower/higher than my fitted curve by a constant. Do you know why this would be? Does this have to do with the way the smooths are fit? – webbe Feb 13 '20 at 17:20
  • I was speaking about the tensor product case, and I never said that this made `s(x1)` equal zero. Setting the data to `0` evaluates the basis functions for the smooth at that value. In a purely additive model with Gaussian family and identify link this has no effect on the contributions of any other smooth/effects in the model. In the tensor product case it has important consequences. Setting the smooth to zero via lpmatrix zeroes it out as you mention, but this is like setting the data for that smooth to some value (I think the mean of the covariate). – Gavin Simpson Feb 13 '20 at 17:34
  • So all the fiddling with the lpmatrix can be avoided if you just predict at some known values of the other covariates, whilst varying the data for the term(s) of interest. If all you're interested in is the Gaussian, identity link case then you can do what you want and add any constant to shift the smooths up or down as you wish (`plot.gam` has a `shift` argument for this). But this is not a general solution. – Gavin Simpson Feb 13 '20 at 17:50
  • I'm not sure why the visgam results differ; if you can edit the question to include a reproducible example I'll take a look – Gavin Simpson Feb 13 '20 at 17:56
  • I did not mean to imply that you said setting x1 to 0 would make s(x1) = 0. That is what I had been thinking and thought I had mentioned in my earlier comment, so I just wanted to clear that up. I have edited the question with a reproducible example. – webbe Feb 13 '20 at 19:11