3

I have fitted a quadratic model with a variance structure that allows different variance levels per level of a factor, and I’m having trouble predicting on a new data set with 2 entries only. Here’s a reproducible example:

library(nlme)
set.seed(101)
mtcars$amf <- factor(mtcars$am)
modGLS <- gls(mpg ~ amf*poly(hp, 2),
           weights = varIdent(form = ~ 1|amf), data = mtcars)
minhp <- min(mtcars$hp); maxhp <- max(mtcars$hp)
newdata <- data.frame(amf = as.factor(c(0, 1)),
                        hp = round(runif(2, min = minhp, max = maxhp)))
newdata2 <- data.frame(amf = as.factor(c(0, 0, 1)),
                        hp = round(runif(3, min = minhp, max = maxhp)))
predict(modGLS, newdata = newdata)
# Error in poly(hp, 2) : 'degree' must be less than number of unique points

predict(modGLS, newdata = newdata2)
## [1]  5.973306 13.758955 44.037921
## attr(,"label")
## [1] "Predicted values"

However, the prediction works well on an lm framework:

modLM <- lm(mpg ~ amf*poly(hp, 2), mtcars)
predict(modLM, newdata = newdata)
##        1        2 
## 25.22253 16.83943 

Why would that be? One of the package maintainers of emmeans seems to believe this may be related with missing information on attr(, “predvars”) (see our discussion here https://github.com/rvlenth/emmeans/issues/133)

I have reported this to Dr Bates (nlme point of contact) but thought I'd reach out to the wider community too. Thanks in advance

Ceres
  • 163
  • 7
  • 1
    Doug Bates is no longer the maintainer of nlme; now it's R-core. I will take a look at this, but it probably deserves to be a bug report (if it isn't already!) – Ben Bolker Jan 17 '22 at 18:55
  • By the way, I suspect that even when you don't get an error, the predictions are **wrong** when new data are provided (which is even worse). – Ben Bolker Jan 17 '22 at 19:19
  • 1
    In particular, there should be a "predvars" attribute in `terms(modGLS)` – Russ Lenth Jan 17 '22 at 22:18
  • 1
    Just added this: https://bugs.r-project.org/show_bug.cgi?id=18283 – Ben Bolker Jan 18 '22 at 02:41
  • Your example can be simplified a lot ... – Ben Bolker Jan 18 '22 at 02:41
  • Thank you @BenBolker and @russ-lenth for your help and reporting the bug to R-core. I have simplified the example and provided an actual answer that explores the missing attribute from the GLS model object `terms`. – Ceres Jan 18 '22 at 20:20

1 Answers1

0

Thanks to @BenBolker and @russ-lenth for confirming that the issue is related to the missing terms attribute "predvars" in the GLS object, which provides the fitted coefficients for poly. Notice how this works in an LM framework (original post) and the attribute is there (see also ?makepredictcall). Note that this can have potential implications for prediction.

## e.g. this fails
poly(newdata$hp, 2)
## but this is okay, because the polynomial has been estimated
polyFit <- poly(mtcars$hp, 2)
predict(polyFit, newdata = newdata$hp)

attr(terms(modLM), "predvars")
## list(mpg, amf, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048), norm2 = c(1, 32, 145726.875, 977168457.086594))))

attr(terms(modGLS), "predvars")
## NULL
Ceres
  • 163
  • 7