5

I am doing a GAM with my response variable as a percentage (0-100). I have used an arcsin transformation to improve model fit (asin(sqrt(myvariable/100))). I now want to evaluate contrasts between levels of my explanatory factor variable on the original scale. I have been trying to use emmeans and followed the steps in the Transformations and link functions vignette to set up my model with the transformation in a format emmeans can read. However, when I run the emmeans function I get the following error: Error in link$mu.eta(object@bhat[estble]) : attempt to apply non-function.

I set up the transformation object like this:

tran <- make.tran("asin.sqrt", 100)

I am confident that bit works, because when I tried it on a linear model with emmeans, it worked:

warp.t <- with(tran, lm(linkfun(breaks)~wool*tension, warpbreaks))
emmeans(warp.t, ~wool|tension, type="response")
tension = L:
 wool response   SE df lower.CL upper.CL
 A        44.2 4.00 48     36.3     52.3
 B        27.7 3.61 48     20.8     35.3

tension = M:
 wool response   SE df lower.CL upper.CL
 A        23.5 3.41 48     17.0     30.7
 B        28.4 3.63 48     21.4     35.9

tension = H:
 wool response   SE df lower.CL upper.CL
 A        23.9 3.43 48     17.4     31.1
 B        18.6 3.13 48     12.7     25.3

Confidence level used: 0.95 
Intervals are back-transformed from the asin(sqrt(mu/100)) scale 

However, if I try it in a gam (both straight into emmeans() or using regrid()), it doesn't work:

dat <- data.frame("x" = rep(1:3, times=12), 
                  "y" = rep(4:6, times=12), 
                  "z" = runif(36, 0, 100), 
                  "m" = rep(1:12, times=3))

gam.t <- with(tran, gam(linkfun(z) ~ x * y + s(m), data=dat))

emmeans(gam.t, ~x|y, type="response")
Error in linkinv(result[[cnm[1]]]) : could not find function "linkinv"

#or
regrid(emmeans(gam.t, ~x|y), transform="response")
Error in flink$mu.eta(object@bhat[estble]) : 
  attempt to apply non-function

It's like it's looking for the inverse link function in the gam, but it's not where emmeans expects it to be. Can I assign it to the gam somehow? Does it not work with gams? Am I doing something wrong?

evelyn
  • 63
  • 5
  • Nice first question! :-) – qdread Apr 13 '23 at 16:00
  • One quick suggestion to improve: please include all `library()` calls for the packages you need to load at the top of your example code. I think here it is emmeans and mgcv, but it is nice to not have to try to guess which packages you are using. – qdread Apr 13 '23 at 16:02
  • Why don't you use a family that is designed for proportions, like `betar()`? – Gavin Simpson Apr 14 '23 at 14:51
  • I agree with @GavinSimpson, I usually don't recommend arcsin-sqrt transformation. Beta regression is a more direct way to model the error in a response value that's constrained to be between 0 and 1 (i.e. a proportion). But it's good to know how to implement those custom transformations in emmeans anyway. – qdread Apr 14 '23 at 15:24
  • @GavinSimpson yes I would normally use a beta regression for proportions. However, my data has quite a few zeros in the response variable and it didn't work well with betar(). I adjusted the value of eps several times to get rid of the negative deviance explained, but the fit was still really bad. The authors of betar() in mgcv actually suggest manually resetting 0s or 1s before modelling. I eventually decided arcsin was the least worst option because at least it's doing the same thing to all the values. – evelyn Apr 18 '23 at 08:43
  • 1
    @evelyn In that case I would likely have switched to using a zero-inflated beta response, for example via: https://www.andrewheiss.com/blog/2021/11/08/beta-regression-guide/ – Gavin Simpson Apr 18 '23 at 11:17

1 Answers1

2

Following the instructions in the vignette under the heading "Specifying a transformation after the fact" I think gives you the result you are looking for:

gam.t2 <- gam(asin(sqrt(z/100)) ~ x * y + s(m), data = dat)

refgrid_gam.t2 <- update(ref_grid(gam.t2), tran = tran)

emmeans(refgrid_gam.t2, ~ x | y, type = "response")

The response variable is transformed in the call to gam(), then we call update() on the reference grid to specify what transformation was used. The output I get is on the percent scale again:

y = 5:
 x response   SE df lower.CL upper.CL
 2       52 10.3 32     31.5     72.2

Confidence level used: 0.95 
Intervals are back-transformed from the asin(sqrt(mu/100)) scale 
qdread
  • 3,389
  • 19
  • 36