10

My data frame looks like:

head(bush_status)
distance  status count
       0 endemic   844
       1 exotic     8
       5 native     3
      10 endemic    5
      15 endemic    4
      20 endemic    3

The count data is non-normally distributed. I'm trying to fit a generalized additive model to my data in two ways so i can use anova to see if the p-value supports m2.

m1 <- gam(count ~ s(distance) + status, data=bush_status, family="nb")
m2 <- gam(count ~ s(distance, by=status) + status, data=bush_status, family="nb")

m1 works fine, but m2 sends the error message:

"Error in smoothCon(split$smooth.spec[[i]], data, knots, absorb.cons, 
scale.penalty = scale.penalty,  : 
  Can't find by variable"

This is pretty beyond me so if anyone could offer any advice that would be much appreciated!

Fbj9506
  • 231
  • 3
  • 11
  • m1 doesn't work fine using your example data. It returns "Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom". – neilfws Aug 23 '17 at 07:19
  • Is `status` a factor variable? Please provide `dput(bush_status)`. – Roland Aug 23 '17 at 07:26
  • @Roland status is a character variable - does it need to be a factor? dput() gives a very long output but last part reads: `Names = c("distance", "status", "count"), row.names = c(NA, -702L), class = "data.frame")` – Fbj9506 Aug 23 '17 at 07:34
  • 5
    Yes, the variable passed to `by` must be a factor variable in mgcv. – Roland Aug 23 '17 at 07:44
  • @Roland it works! Thank you! – Fbj9506 Aug 23 '17 at 07:48

1 Answers1

21

From your comments it became clear that you passed a character variable to by in the smoother. You must pass a factor variable there. This has been a frequent gotcha for me too and I consider it a design flaw (because base R regression functions deal with character variables just fine).

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 2
    Note that this also applies when using the `s()` function in the `brms` package, which is a wrapper for the function of the same name in the `mgcv` package. – mikoontz Mar 11 '19 at 00:01