5

In R, I would like to fit a gam model with categorical variables. I thought I could do it like with (cat is the categorical variable).

lm(data = df, formula = y ~ x1*cat + x2 + x3);

But I can't do things like :

gam(data = df, formula = y ~ s(x1)*cat + s(x2) + x3)

but the following works:

gam(data = df, formula = y ~ cat + s(x1) + s(x2) + x3)

How do I add a categorical variable to just one of the splines?

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Courvoisier
  • 904
  • 12
  • 26
  • 1
    This question is off topic here because it concentrates on functions in R. – Michael R. Chernick Apr 10 '17 at 17:35
  • 2
    The thing you appear to be trying in the second chunk of code (and interaction between a categorical variable and a smooth), can be accomplished using the `by` function. i.e. `s(x,by=cat)` will fit a separate smooth for each level of `cat`. –  Apr 10 '17 at 18:08

1 Answers1

8

One of the comments has more or less told you how. Use by variable:

s(x1, by = cat)

This creates the "factor smooth" smoothing class fs, where a smooth function of x1 is created for each factor level. Smoothing parameters are also duplicated but not linked, so they are estimated indecently. You can set

s(x1, by = cat, id = 0)

to use a single smoothing parameter for all "sub smooths".

Also note that contrast does not apply to factor but smooth function is still subject to centering constraint. What this means is that you need to specify factor variable as a fixed effect, too:

s(x1, by = cat) + cat
Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
  • 6
    Minor clarification: `s(x1, by = cat)` *doesn't* create a `"fs"` class smooth - if it did you wouldn't need `+ cat` for the centring. If you want what mgcv calls an `"fs"` smooth then you need `s(x1, cat, bs = "fs")` (and no parametric `cat` term). – Gavin Simpson Apr 26 '17 at 17:57