5

When I run gam.check(my_spline_gam), I get the following output.

Method: GCV   Optimizer: magic
Smoothing parameter selection converged after 9 iterations.
The RMS GCV score gradiant at convergence was 4.785628e-06 .
The Hessian was positive definite.
The estimated model rank was 25 (maximum possible: 25)
Model rank =  25 / 25 

Basis dimension (k) checking results. Low p-value (k-index<1) may
indicate that k is too low, especially if edf is close to k'.

         k'    edf k-index p-value
s(x) 24.000 22.098   0.849    0.06

My question is whether I can extract this p-value separately to a table.

Thomas K
  • 3,242
  • 15
  • 29
a_geo
  • 157
  • 1
  • 1
  • 6
  • `str(gam.check(my_spline_gam))` somewhere the p-value should be. – Andre Elrico Nov 20 '18 at 09:42
  • that still gives the same output, whereas I would just want either only the one line of results or just the p-value. thanks! – a_geo Nov 20 '18 at 09:51
  • 1
    please add the result of `dput(gam.check(my_spline_gam))` to your question. Then I can solve it. – Andre Elrico Nov 20 '18 at 09:52
  • 2
    quick look at the code suggests you can use `k.check(yourmodel, subsample = 5000, n.rep = 200)` – user20650 Nov 20 '18 at 09:59
  • @AndreElrico: this is the output: dput(gam.check(my_spline_gam)) Method: GCV Optimizer: magic Smoothing parameter selection converged after 9 iterations. The RMS GCV score gradiant at convergence was 4.785628e-06 . The Hessian was positive definite. The estimated model rank was 25 (maximum possible: 25) Model rank = 25 / 25 Basis dimension (k) checking results. Low p-value (k-index<1) may indicate that k is too low, especially if edf is close to k'. k' edf k-index p-value s(x) 24.000 22.098 0.849 0.03 structure(list(mfrow = c(2L, 2L)), .Names = "mfrow") – a_geo Nov 20 '18 at 15:14
  • @user20650: k.check returns: Error: could not find function "k.check". – a_geo Nov 20 '18 at 15:15
  • @a_geo ; it is in the [mgcv package](https://github.com/cran/mgcv/blob/master/R/plots.r#L175). I have package `packageVersion("mgcv") ; ‘1.8.25’` . See [could-not-find-function](https://stackoverflow.com/questions/7027288/error-could-not-find-function-in-r) for troubleshooting. – user20650 Nov 20 '18 at 18:51

2 Answers2

3

Looks like you cannot store the result in an object the normal way. You could use capture.output to store the console output in an object, and then subsequently use str_split to get the correct value. So for the example in the help file this would be:

library(mgcv)
set.seed(0)
dat <- gamSim(1,n=200)
b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat)
r <- capture.output(gam.check(b))
p <- strsplit(r[12], " ")[[1]][11]

But because the p-value is just a string you wouldn't get the exact p-value this way.

Edit: user20650's answer will give you the proper output:

r <- k.check(b)
r[,'p-value']
jay.sf
  • 60,139
  • 8
  • 53
  • 110
Ravi
  • 81
  • 5
  • thx! k.check seems not to work. is it another package I should import? – a_geo Nov 20 '18 at 16:08
  • @a_geo Means you need to do `k.check(my_spline_gam)` rather than `gam.check(my_spline_gam)`. P-values are very similar and you should be able to use them. – jay.sf Nov 20 '18 at 19:01
  • @jay.sf ; p-values are the *same*: any differences are due to the calculation being stochastic. So best to use `set.seed` – user20650 Nov 20 '18 at 19:32
  • @user20650 Both commands yield slightly different p-values. But I'm with you that we can *consider* both the same due to stochastic calculations within the commands, where' you cannot easily set a seed and obviously `set.seed()` has no effect. – jay.sf Nov 20 '18 at 19:47
  • 1
    @jay.sf ; I was about to disagree with you as `gam.check` explicitly calls `k.check` so they should give the same results (and you can use the `seed` as `set.seed(1) ; printCoefmat(k.check(b, subsample = 5000, n.rep = 200), digits = 3)`). However, a quick look shows there are other random sample calls used in the earlier plot functions in `gam.check` (not related to the output table), which will move the seed, hence difficult to get the same. But it is the same function doing the work.. – user20650 Nov 20 '18 at 20:45
  • thank you all, but the k.check() does not work for me. Error: could not find function "k.check" – a_geo Nov 22 '18 at 08:46
  • @a_geo ; `k.check` exits in mgcv (and is actually in `gam.check`). If you can't see the function, or don't have it, it could be because you are using an earlier version of mgcv or R. Can you edit your question with the results of `sessionInfo()`. – user20650 Nov 22 '18 at 23:15
1

Use capture.output coupled with a little string manipulation -

gam_obj <- capture.output(gam.check(b,pch=19,cex=.3))
gam_tbl <- gam_obj[12:length(gam_obj)]
str_spl = function(x){
  p_value <- strsplit(x, " ")[[1]]
  output_p <- as.numeric(p_value[length(p_value)])
}
p_values <- data.frame(sapply(gam_tbl, str_spl))

Output

enter image description here

Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42
  • 1
    thx! it didn't work exactly, but an elaboration of this worked fine. The only thing is that ideally I would also like to have the result up until the 3rd decimal. gam_obj <- capture.output(gam.check(my_spline_gam,pch=19,cex=.3)) gam_tbl <- gam_obj[12:length(gam_obj)] p_str = unlist(strsplit(gam_tbl, " ", fixed=TRUE)) p_value = as.numeric(p_str[8]) p_value – a_geo Nov 20 '18 at 16:05