19

I want to use function gam in mgcv packages:

 x <- seq(0,60, len =600)
 y <- seq(0,1, len=600) 
 prova <- gam(y ~ s(x, bs='cr')

can I set the number of knots in s()? and then can I know where are the knots that the spline used? Thanks!

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
memy
  • 229
  • 1
  • 2
  • 8

1 Answers1

37

While setting k is the correct way to go, fx = TRUE is definitely not right: it will force using pure regression spline without penalization.


locations of knots

For penalized regression spline, the exact locations are not important, as long as:

  • k is adequately big;
  • the spread of knots has good, reasonable coverage.

By default:

  • natural cubic regression spline bs = 'cr' places knots by quantile;
  • B-splines family (bs = 'bs', bs = 'ps', bs = 'ad') place knots evenly.

Compare the following:

library(mgcv)

## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi))  ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e

## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp  ## extract knots locations

## fitting B-spline
bs_fit <- gam(y ~ s(x, bs = 'bs', k = 20))
bs_knots <- bs_fit$smooth[[1]]$knots  ## extract knots locations

## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "grey", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x, y, col= "grey", main = "B-spline");
lines(x, bs_fit$linear.predictors, col = 2, lwd = 2)
abline(v = bs_knots, lty = 2)

enter image description here

You can see the difference in knots placement.


Setting your own knots locations:

You can also provide your customized knots locations via the knots argument of gam() (yes, knots are not fed to s(), but to gam()). For example, you can do evenly spaced knots for cr:

xlim <- range(x)  ## get range of x
myfit <- gam(y ~ s(x, bs = 'cr', k = 20),
         knots = list(x = seq(xlim[1], xlim[2], length = 20)))

Now you can see that:

my_knots <- myfit$smooth[[1]]$xp
plot(x, y, col= "grey", main = "my knots");
lines(x, myfit$linear.predictors, col = 2, lwd = 2)
abline(v = my_knots, lty = 2)

enter image description here

However, there is usually no need to set knots yourself. But if you do want to do this, you must be clear what you are doing. In particular, the number of knots you provide must not conflict with the k in s().

This is a very rich answer. The length of bs_knots is 24. The "dimension" of the spline basis is in bs_fit$smooth[[1]]$bs.dim, which is 20.

Yes, for B-splines family, the number of B-splines does not equal the number of knots. Knots placement for B-splines is a dirty work and error-prone. See https://stackoverflow.com/a/72723391/4891738 for a demonstration with B-splines.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • 2
    This is a very rich answer. The length of `bs_knots` is 24. The "dimension" of the spline basis is in `bs_fit$smooth[[1]]$bs.dim`. – IRTFM Oct 20 '16 at 16:07
  • @李哲源 Thanks for the detailed answer. Is there a way to extract the knots without calling `gam`, e.g. by calling `smoothCon`? – papgeo Sep 04 '18 at 21:39
  • 1
    @papgeo Say `x <- runif(100); sm <- smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = NULL); sm[[1]]$xp` Note that if you use `smoothCon`, you can directly set `knots` (but it needs match `k`). `knots = FALSE` will do auto knots placement. – Zheyuan Li Sep 04 '18 at 23:28
  • @李哲源 Thank you very much. I see 20 knots in the code you provided. But when I try `smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = seq(0,1,length.out=20))` it gives me an error message: Error in data[[txt]] : subscript out of bounds. – papgeo Sep 04 '18 at 23:57
  • 1
    @papgeo `smoothCon(s(x, bs = 'cr', k = 20), data = data.frame(x = x), knots = list(x = seq(0,1,length.out=20)))` Knots need be provided in a list. This makes sense as when constructing multivariate splines like tensor product splines, each variable has a set of knots and they should be given in a list. – Zheyuan Li Sep 05 '18 at 00:08