0

I'm using the mgcv package to create GAMs in R. Right now I am attempting to model the interaction between my numerical variable Depth and various categorical variables. I have converted all of the categorical variables from characters to factors and generally the GAM function seems to be working well to model the interactions, but it is not doing any calculations or modeling for the first term of any of my categorical variables as far as I can tell.

This is my relevant code:

test1 <- gam(Depth ~ Bottom_typef + Tidal_stagef, data = data, 
                     method = "REML")
plot(test1, all.terms = TRUE, page = 1)
summary(test1)

This then outputs this graph and summary:

GAM plot output is shown

Family: gaussian
Link function: identity

Formula:
Depth \~ Bottom_typef + Tidal_stagef

Parametric coefficients:
Estimate Std. Error t value Pr(\>|t|)  
(Intercept)           33.0838     0.3238 102.181  \< 2e-16 \*\*\*
Bottom_typefSand     -20.4855     0.5053 -40.538  \< 2e-16 \*\*\*
Bottom_typefSeagrass -26.7308     0.3243 -82.435  \< 2e-16 \*\*\*
Tidal_stagefOutgoing  -0.4571     0.1599  -2.858  0.00429 \*\*

The Bottom_typef factor should have 3 levels, Artificial Reef, Sand, and Seagrass, and the Tidal_stagef factor has two, Incoming and Outgoing. When plotted, the first two categories respectively, Artificial Reef and Incoming, don't appear to actually be plotted, just appear as a line at 0. In the summary, they aren't even listed under the Intercepts. I have other categorical variables in my data set and the same thing occurs for all of them, the first category is not plotted or calculated in the summary.

Any input on how to fix this issue or on what may be causing the issue would be greatly appreciated. I apologize if I am using improper terminology or leaving out any important info, let me know if there's any other info needed to solve the problem.

kjetil b halvorsen
  • 1,206
  • 2
  • 18
  • 28
Zesra
  • 1
  • 1
  • 1
    Greetings! It is typically useful to provide some reproducible form of your data so people can help. It looks like the name of your data is simply `data`, so if you run `dput(data)` and paste the output into your questions, others may be able to assist you further. Also, it appears all of your coefficients are parametric. I'm not sure if that's on purpose, but these do not have splines and thus won't model nonlinear terms in your regression. Otherwise for all intents and purposes this is basically just a regular regression. – Shawn Hemelstrand Feb 25 '23 at 17:04
  • https://stackoverflow.com/questions/41032858/lm-summary-not-display-all-factor-levels – user20650 Feb 25 '23 at 18:54
  • Please provide enough code so others can better understand or reproduce the problem. – Community Feb 25 '23 at 22:51
  • The intercept represents the combination of reference categories/levels – Gavin Simpson Feb 26 '23 at 11:39
  • This must be a faq, see https://stats.stackexchange.com/questions/285210/what-to-do-in-a-multinomial-logistic-regression-when-all-levels-of-dv-are-of-int/544656#544656 and https://stackoverflow.com/questions/71215280/gtsummary-output-with-mgcv-gam – kjetil b halvorsen Feb 27 '23 at 02:36

0 Answers0