Question moved to CrossValidated
I am trying to express the difference in the 'speed of increase' between two categories in gam
modelling. My data represents the cumulative values over time [0-100%], but I wish (for comparability with other studies) to express them in yearly values. I wonder if it is even meaningfull, if my models are gam
, and therefore non-linear?
Previously, to calculated the 'annual values' per two groups by the Compound annual growth rate (CARG) from the starting and ending values/number of years from the gam
predicted values here. Now, I wonder if is it somehow possible get those value from the model itselft, and converted to annual increase? Something, like 'Betas' from 'glm' to show that one predictor is more important then the other?
My data looks like this: I have two groups, with the cumulative rate of forest loss over time (ranging between 0-100% from no forest loss to complete forest loss). Eg. group yellow
has predicted lost ~ 40 % of forest from 2000 to 2017, group blue
has predicted lost ~ 10 % over the same time. My dummy data are available here and in dput
form below. My pseudo-code model: forest_loss ~ year + location
, where different groups represent different location.
# Restructure the data first
dd$location = as.factor(dd$location) # claim grouping variable as factor
dd$forest_loss <- dd$forest_loss/100 # the the values between 0-1
# Plot the gam model
ggplot(dd, aes(x = year,
y = forest_loss,
group = location,
color = location)) +
geom_smooth(method = 'gam',
method.args = list(family = "betar"),
formula = y~s(x, bs = 'cs')) +
geom_point()
My dummy model is in a formula similar to this: m1<-bam(y ~ s(x, by = grp) + grp, dd, family = betar)
(I am using the form as used for the real data, althought I have more predictors there), based on bam/gam
. Family betar
is because by predicted values ranges between 0-100 (0-1).
This is the dummy example so some part of teh code (gam
) can produce errors. But my main question is how to express the annual increase per group from the cumulative values themselves? Is it already accessbible from summary(m1)
, and it is meaningfull?
Thank you for your thoughts.
Dummy data (shared as suggested here using the dput
)
dd<- structure(list(year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L,
2015L, 2016L, 2017L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2013L, 2014L,
2015L, 2016L, 2017L), forest_loss = c(2, 2.5, 2.7, 2.9, 2.9, 3, 3.1, 3.1,
3.1, 4, 5, 5.6, 5.8, 5.9, 6.7, 7.2, 9, 10, 3, 3.2, 3.4, 3.5,
3.5, 7, 7.2, 8, 15, 19, 22, 25, 27, 29, 32, 33, 35, 40), location = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("blue", "yellow"), class = "factor")), row.names = c(NA,
-36L), class = "data.frame")
Answer to comments:
somehow, the tidymv::plot_smooths(m1)
provides an error:
Error in `parse_expr()`:
! `x` must contain exactly 1 expression, not 0.
Run `rlang::last_error()` to see where the error occurred.
Output from the summary(m1)
Family: Beta regression(75.174)
Link function: logit
Formula:
y ~ s(x, by = grp)
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.671 0.083 -32.19 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(x):grpblue 1.000 1.000 11.64 0.000647 ***
s(x):grpyellow 3.684 4.547 283.12 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.952 Deviance explained = 85%
-REML = -64.262 Scale est. = 1 n = 36
Partial effect by gratia: gratia::draw(m1, scales = 'free')