4

I have a model structured as follows, and I would like to extract the predicted values while ignoring the random effect. As specified in ?predict.gam and here, I am using the exclude argument, but I am getting an error. Where is my mistake?

dt <- data.frame(n1 = runif(500, min=0, max=1),
             n2 = rep(1:10,50), 
             n3 = runif(500, min=0, max=2),
             n4 = runif(500, min=0, max=2),
             c1 = factor(rep(c("X","Y"),250)),
             c2 = factor(rep(c("a", "b", "c", "d", "e"), 100)))

mod = gam(n1 ~ 
           s(n2, n3, n4, by=c1) +
           s(c2, bs="re"),
         data=dt)

newd=data.table(expand.grid(n1=seq(min(dt$n1), max(dt$n1), 0.5), 
                        n2=1:10,
                        n3=seq(min(dt$n3), max(dt$n3), 0.5),
                        n4=seq(min(dt$n4), max(dt$n4), 0.5),
                        c1=c("X", "Y")))
newd$pred <- predict.gam(mod, newd, exclude = "s(c2)")

In predict.gam(mod, newd, exclude = "s(c2)"): not all required variables have been supplied in  newdata! 
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
mluerig
  • 638
  • 10
  • 25

1 Answers1

4

exclude does not work in the way as you assumed. You still need to provide all variables in your newd for predict.gam. See my this answer for what is behind predict.gam.

Here is what you need to do:

## pad newd with an arbitrary value for variable c2
newd$c2 <- "a"
## termwise prediction
pt <- predict.gam(mod, newd, type = "terms", exclude = "s(c2)")
## linear predictor without random effect
lp_no_c2 <- rowSums(pt) + attr(pt, "constant")
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • if I were to exclude multiple variables, how would I do it - like this? `pt <- predict.gam(mod, newd, type = "terms", exclude =c( "s(c2)", s(n4)))` – mluerig Mar 04 '19 at 14:37
  • ah, ok, so I would have to specify the whole term: `pt <- predict.gam(mod, newd, type = "terms", exclude =c( "s(c2)", "s(n2, n3, n4)"))` (ignoring the fact that for this example model then there would be not predictor left) – mluerig Mar 05 '19 at 15:18
  • 2
    @lueromat Use `sapply(mod$smooth, “[[”, “label”)` to see what terms are in the model. Note that factor “by” has a smooth for each factor level, so you will see for example `s(n2,n3,n4):c1X` for level “X” of `c1`. – Zheyuan Li Mar 05 '19 at 15:26