4

I have run a GAM in R using the mgcv package with the following form:

shark.gamFINAL <- gam(ln.raw.CPUE...0.1 ~ Year + Month + 
                      s(Mean.Temp, bs = "cr") + s(Mean.Chl.a, bs = "cr") + 
                      s(Mean.Front.density, bs = "cr"), data=r, family=gaussian)

After running this model and calculating the percentage deviance explained by each variable I would like to plot the effect of each variable against the response

However when I use the plot.gam function in R my graphs come out with a y axis that is "s(predictor variable, edf)"

I am not sure what this scale of the y axis represents?

Is there a way that I could change the y axis range to that which represents the response, as has been done in this paper: Walsh and Kleiber (2001), 'Generalized additive model and regression tree analyses of blue shark (Prionace glauca) catch rates by the Hawaii-based commercial longline fishery' in FIGURE 3.

I would have posted some examples of the plots I am describing but as this is my first post I do not have at least 10 reputation, so it won't let me do it!

I have searched many sites and forums to try and find an answer for this, but to no avail, any help would therefore be hugely appreciated!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
user2236109
  • 41
  • 1
  • 3

1 Answers1

8

The axis is the value taken by the centred smooth. It is the contribution (at a value of the covariate) made to the fitted value for that smooth function.

It is easy to change the y axis label - supply the one you want to the ylab argument. This of course means you have to plot each smooth separately if you want a separate y-axis label for each plot. In that case also use the select argument to plot specific smooth functions, for example:

layout(matrix(1:4, ncol = 2, byrow = TRUE)
plot(shark.gamFINAL, select = 1, ylab = "foo")
plot(shark.gamFINAL, select = 2, ylab = "bar")
plot(shark.gamFINAL, select = 3, ylab = "foobar")
layout(1)

The only way I know to change the scale of the y-axis is to build the plot by hand. Those plots are without the contribution from the model constant term, plus the other parametric terms. If your model only had an intercept and one smooth, you could generate new data over the range of that covariate and then predict from the model for these new data values but use type = "terms" to get the contribution for the smooth. You then plot the value returned from predict plus the value of the "constant" attribute returned by predict.

In your case, you need to control for the other variables when predicting. One way to do that is to set all the other covariates to their means or typical value but allow the covariate of interest to vary over its range, as before. You then sum up the values in each row of the matrix returned by predict(shark.gamFINAL, newdata = NEW, type = "terms") (where NEW is the new data frame to predict at, varying one covariate but holding the rest at some typical value), again adding on the constant. You have to redo this for each covariate in turn (i.e. once per plot) as you need to keep the other covariates held at the typical value.

All this does is shift the scale on the axis though - none of the smooths in your model interact with other smooths or terms in the model so perhaps it might be easier to think of the y-axis as the effect on the response for each smooth?

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Thank you very much for the advice! I have actually just found some code somewhere else that suggests the approach where you set the other covariates at their means. The example code they provided for this was: testdata = data.frame(Income=seq(.4,1, length=100), Edu=mean(mod_gam2$model$Edu), Health=mean(mod_gam2$model$Health)) fits = predict(mod_gam2, newdata=testdata, type='response', se=T) predicts = data.frame(testdata, fits) ggplot(aes(x=Income,y=fit), data=predicts) + geom_smooth(aes(ymin = fit - se.fit, ymax=fit + se.fit), fill='gray80', size=1,stat='identity') – user2236109 Apr 04 '13 at 21:07
  • This code was on p17 of this web address: http://www3.nd.edu/~mclark19/learn/GAMS.pdf This produced a plot similar to that in fig. 8 on p17 of this link but now I am wondering does this represent the effect of the predictor variable on the response or the effect of the predictor variable after it has been smoothed?Presumably the shape of the trend line is determined by a loess smooth as default, whereas I have used cubic spline smoothing in my model so I will have to change this? Thank you in advance – user2236109 Apr 04 '13 at 21:25
  • You are making things complicated by not also fully understanding what the plotting code is doing. In **ggplot**, geoms are associated with stats. The default stat in `geom_smooth` is `stat_smooth`. Notice in their code they change the stat in in the `geom_smooth` call from its default `stat = "smooth"` to `stat = "identity"`. The identity stat is special as it just maps from x to x with no transformation or summary. The author is using this as a quick easy way to have a ggplot figure of the partial effect of Income with the confidence interval bound. – Gavin Simpson Apr 04 '13 at 22:28
  • By the way, I really think those should be +/- (1.96 * se.fit) for approximate 95% confidence intervals. Anyway in sum, what you are seeing is the fitted value of the response from your cubic spline fitted to income, conditional upon the other variables being held at their sample mean. You are *not* seeing a lowess smooth fitted to the partial fitted values. – Gavin Simpson Apr 04 '13 at 22:30
  • Ah ok now I understand! So the identity stat uses the type of smoothing applied to the predictor variables in the GAM. Thank you – user2236109 Apr 05 '13 at 09:40
  • Something else I wanted to ask was how to change the setup of the x axis in ggplot2. I need to create the same types of graph as above to show the effect of month and year on the response. These are obviously discrete predictor variables rather than the graphs before which had continuous x axes, so rather than a line being produced this should create a discrete point with error bars for each month or year. Is this possible, and if so would you be able to give some example code? Thanks in advance – user2236109 Apr 05 '13 at 16:36
  • @user2236109 You might try the `termplot` method for `gam` objects. `Year` and `Month` are either numeric of factors in the model - not clear from the information you give - but they are parametric terms. You need to create prediction data as per the PDF for each month & year combination you want a prediction for, and then draw the plot. I think now you need a new question about that with a **reproducible** example so we can help more - comments are really for extend follow-ups like this. – Gavin Simpson Apr 05 '13 at 16:45
  • @user2236109 Also note that if you want to model a seasonal cycle more parsimoniously than with 11 parameters (from a factor `Month`) variable, look at the cyclic splines in `gam`. E.g. do `s(Month, bs = "cc")` where you have `Month` stored as a **numeric** variable. – Gavin Simpson Apr 05 '13 at 16:46
  • Sorry, maybe I didnt make myself clear. I am aiming to plot the effect of year and month seperately on the response. I have the years 1998-2011 and I am only interested in the months June-September (so 6,7,8,9). An example of what I am trying to do is in figure 3 of the paper: Carvalho et al (2011), 'Spatial predictions of blue shark (Prionace glauca) catch rate and catch probability of juveniles in the Southwest Atlantic' He has created plots for each predictor and this is what I am aiming to do as well. Thanks – user2236109 Apr 05 '13 at 17:32
  • I don't think you've understood me. If you have data on all months, I would have modelled that using a cyclic spline. But anyway. What you want is a data frame where you have `newdat <- with(r, data.frame(Month = 6:9, Year = rep(2005, 4), Mean.Temp = rep(mean(Mean.Temp), 4), Mean.Chl.a = rep(mean(Mean.Chl.a), 4), Mean.Front.density = rep(mean(Mean.Front.density), 4)))`. If you want the effects of Month over all the years, then extend each of the vectors accordingly. – Gavin Simpson Apr 05 '13 at 17:48
  • That will give new `y` values for the `x` data in `newdat`. Use those data to plot. If you provide a link to the paper I'll see if I can view it, but a title/author isn't too helpful on its own and requires me to hunt it down. – Gavin Simpson Apr 05 '13 at 17:49
  • Sorry, a link would have been more helpful, here it is: http://icesjms.oxfordjournals.org/content/68/5/890.full – user2236109 Apr 05 '13 at 18:19
  • @user2236109 OK, without a reproducible example, I'm not going to do much more here. I have Answered a similar question that did have an example and there I show the required code to produce the data required to plot the smooth terms. You want the first example there, which looks at the contribution from `a` whilst holding `b` at its mean. Follow the same idea for `Month`; create a data frame with a column for `Month` containing the values `6:9`, and columns for the other covariates, where you just repeat the mean value 4 times, e.g. `rep(mean(Mean.Temp), 4)`. You can do same for Year. – Gavin Simpson Apr 06 '13 at 23:09
  • By the way, the responses in the paper don't look to have been changed much, if at all. Note they have used `scale = 0` if the y-axis scales are all different - add that to the `plot.gam()` call when displaying the smooths for your model. – Gavin Simpson Apr 06 '13 at 23:12