2

I am using the mgcv package in R to fit a GAM to some hydrologic data as follows:

d <- GAM_example_data[,1:4]
colnames(d) <- c("month","rain","pump","GWL")             
fitted_GAM <- gam(GWL~s(month) + s(rain) + s(pump), data = d)
plot.gam(fitted_GAM)

When I get the plots that are output from plot.gam, on the y-axis it tells me the degrees of freedom for each of the smoothing functions, and these are often non-integer values. I wish to be able to control the degrees of freedom for each of the smooth functions used, is there a way to do this?

I have seen references to specifying the "knots" and therefore controlling the fit but I am fairly new to the concept of GAMs and I haven't been able to find any clear resources explaining what these are (if they are even related to my problem at all).

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
James Woolley
  • 23
  • 1
  • 5

2 Answers2

3

I have been closely following how you would respond to the other answer. From your reply it appears that know several concepts in GAM well, then I could produce a short answer.

Unfortunately, no. mgcv GAM is not doing estimation using backfitting, but performs a joint estimation of smoothing parameters by GCV or REML. So unlike the legacy gam package, where you can specify a df for each spline term, you can't achieve this in mgcv.

The only way to control smoothness in penalized regression setting, is to set smoothing parameter sp, but its relationship with degree of freedom is not in closed form and you can not foresee it.

The other answer is suggesting you doing a pure regression spline without penalization. By setting a rank k and signaling fx = TRUE, you always have degree of freedom equal to rank minus one (as a result of centering constraint), which is an integer.


Here are some other answers I made on smoothing.

smooth.spline(): fitted model does not match user-specified degree of freedom explains how setting df works in smooth.spline. Note that this is the basis of backfitting GAM.

How to interpret lm() coefficient estimates when using bs() function for splines explains the basis of pure regression spline. Of course, mgcv offers a great many spline basis class, not just the B-spline used by splines::bs.

Community
  • 1
  • 1
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
1

There are a lot of parameters in the gam documentation to get your head around.

I think the most useful parameter for your case is k, the basis dimension. Essentially, it sets the upper limit on the degrees of freedom for a smooth using s. Here is some documentation.

So you might run, for example:

gam(GWL ~ s(month, k = 4) + ...)

Then examine your model using plot.gam and gam.check. If the diagnostics don't look good, you can adjust k up or down until they improve.

EDIT: According to this answer, the fx = TRUE argument to s() will fit a regression spline with fixed degrees of freedom. k will equal total df and k-1 = edf.

neilfws
  • 32,751
  • 5
  • 50
  • 63
  • Thanks for your response, adjusting k to set the maximum degrees of freedom has helped me play with the fit a bit more. In addition however, for some other verification process I need to be able to replicate the GAM fitting results from a legacy piece of software that I no longer have access to. Do you know if there is a way to directly set what degree of freedom to fit to? Or at least constrain the values to integer values? – James Woolley May 17 '17 at 03:30
  • 1
    According to this answer, the `fx = TRUE` argument to `s()` will fit a regression spline with fixed degrees of freedom: https://stats.stackexchange.com/questions/12223/how-to-tune-smoothing-in-mgcv-gam-model. I guess k will equal total df and k-1 = edf. – neilfws May 17 '17 at 03:36
  • Great, thanks very much. If you could add your comment to your original answer, and I'll accept it. – James Woolley May 17 '17 at 04:10