The question is that is this the correct way to specify the knots in the smoothing spline in gam in mgcv?
The confusion part is that in the vignette, it says the k is the dimension of the basis used to represent the smooth term.
(Previously I thought that in the "cr" setting, the dimension of the basis is 3. After reading p. 149-150 (GAM, an introduction to R), it seems that the gam uses a set of k basis to write the cubic regression splines.)
However, in the post below, it shows that k is actually the number of knots. This is verified by the code below
# reference
# https://stackoverflow.com/questions/40056566/mgcv-how-to-set-number-and-or-locations-of-knots-for-splines
library(mgcv)
## toy data
set.seed(0); x <- sort(rnorm(400, 0, pi)) ## note, my x are not uniformly sampled
set.seed(1); e <- rnorm(400, 0, 0.4)
y0 <- sin(x) + 0.2 * x + cos(abs(x))
y <- y0 + e
## fitting natural cubic spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = 20))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
Then, to use the smoothing spline, should I assign the knots manually in the argument of gam? The attempted code is below:
## fitting natural cubic spline, smoothing spline
cr_fit <- gam(y ~ s(x, bs = 'cr', k = length(x)), knots=list(x))
cr_knots <- cr_fit$smooth[[1]]$xp ## extract knots locations
## summary plot
par(mfrow = c(1,2))
plot(x, y, col= "blue", main = "natural cubic spline");
lines(x, cr_fit$linear.predictors, col = 2, lwd = 2)
abline(v = cr_knots, lty = 2)
plot(x,cr_knots)
cr_fit$sp
Is this understanding correct?
If yes, then how can I implement the smoothing splines method with the gam in the mgcv?