1

I wanted to use predict on some averaged models which were built in nlme to plot confidence intervals of modelled relationships. But, I've found this is not possible using nlme and MuMIn::model.avg. Instead, I plan to use glmmTMB, as suggested here. However, I'm struggling to work out how to set the correlation structure in glmmTMB.

The following is a small subset of my data, and the model specification in nlme. The data are an incomplete time series, and the random structure is the test position in a sequence for a given ID, nested within ID.

library(nlme)
library(glmmTMB)

mydata <- structure(list(id = c("F530", "F530", "F530", "F530", "F530", "M391", "M391", "M391", "M391", "M391", "M391", "M391"),testforid = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("1", "2"), class = c("ordered", "factor")), time = c(12.043, 60.308, 156.439, 900.427, 1844.542, 42.095, 61.028, 130.627, 194.893, 238.893, 905.282, 1859.534), a = c(35.5786398928957, 35.4973671656257, 36.7414694383557, 37.4316029157078, 36.0805603474457, 38.892219234833, 37.081136308003, 37.339272893363, 36.744902161663, 36.741897283613, 38.158072893363, 38.946697283613), b = c(0.0079975108148372, 0.0151689857479705, 0.0275942757878888, 0.0125676102827941, 0.0352227834243443, 0.0195902976534779, 0.0118588484445401, 0.0069799148425349, 0.00723445099500534, 0.00787758751826021, 0.0162518412492866, 0.0127526068249484), c = c(1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), class = "data.frame")

model.lme <- lme(a ~ b + c,
                 random = list(id = ~1, testforid = ~1),
                 correlation = corExp(metric = "maximum", nugget = TRUE),
                 method = "ML",
                 data = mydata)

I tried following the instructions in this vignette, converting time to a factor with unit spaced time points as levels (in this case milliseconds), and setting a single grouping factor:

mydata$times <- factor(mydata$time,
                       levels = seq(from = min(mydata$time),
                                    to = max(mydata$time),
                                    by = 0.001))
mydata$group <- 1

I then guessed at my model structure being (not sure this is correct):

model.glmmTMB <- glmmTMB(a ~ b + c + exp(times + 0 | group) + (1|id/testforid), data = mydata)

And got the following error:

Error in parseNumLevels(reTrms$cnms[[i]]) : 
  Failed to parse numeric levels: times12.043times42.095times60.308times61.028times130.627times156.439times194.893times238.893times900.427times905.282times1844.542times1859.534
In addition: There were 12 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In lapply(strsplit(tmp, ","), as.numeric) : NAs introduced by coercion

My guess is the problem is that the time series is incomplete, but I'm not sure.

Any thoughts/suggestions on if/how I can properly convert the model from nlme to glmmTMB, or failing that on how I might bootstrap confidence intervals from averaged nlme models (averaged using MuMIn::model.avg) would be greatly appreciated. Thanks!

pajul
  • 123
  • 9

1 Answers1

1

There are two important things:

  • you need to use numFactor() instead of factor: for a 1-D structure (such as time), this basically just makes your variable into a factor with levels corresponding to the unique values (in contrast to your use of factor, which gives you a variable with more than a million levels ...)
  • as pointed out by Kasper Kristensen (author of TMB/glmmTMB), you should use ou() (Ornstein-Uhlenbeck) for exponential decay of correlation in time; exp() is for an exponential decay of correlation in space (and is much slower ...)

So this works:

mydata$times <- numFactor(mydata$time)
mydata$group <- 1
model.glmmTMB <- glmmTMB(a ~ b + c + ou(times + 0 | group) + (1|id/testforid), 
                data = mydata)

But it doesn't quite correspond to the lme model fit (even setting aside the issue of using metric = "maximum", which I think is not going to be possible in the current version of glmmTMB). lme fits the correlation structure within the groups defined by the random effects, so this:

model.glmmTMB <- glmmTMB(a ~ b + c + ou(times + 0 | id/testforid),
    data = mydata)

is closer. (You don't need nugget = TRUE because glmmTMB includes a residual variance term by default unless you use dispformula = ~0 to turn it off [corresponding to nugget = FALSE].)

This gives you a warning message about a non-positive-definite Hessian matrix. However, this actually matches the lme results too: if you run intervals(models.lme), you'll that the confidence intervals for most of the parameters other than the fixed effects cover a huge range (e.g. 2e-17 to 8e+15 for the random-effects SD at the id level), corresponding to unidentifiable parameters. (Hopefully this is because you gave us only a small subset of your data, and won't happen with your real problem.)


(Hope to update sims below to use ou() instead of exp() shortly ...)

update: It looks like the computational cost of this model (with ou()) scales as about (number of unique time points)^2.5. On my machine, without turning on parallelization (which may or may not help — I suspect that the relevant part of the code isn't parallelized), running with 1500 observations (and 1500 unique times) takes 45 seconds.

You could also try rounding your time values so that there are a smaller number of unique time values ...

library(glmmTMB)
form <- a ~ b + c + ou(times + 0 | id)

## n should be a factor of 5
simfun <- function(n, round_times = FALSE, seed = 101) {
    if (!is.null(seed)) set.seed(seed)
    bigdata <- data.frame(b = runif(n, 0.001, 0.1),
                          c = sample(0:1, n, replace = TRUE),
                          time = c(10, 60, 150, 900, 1850)*runif(n, 0.9, 1.1),
                          id = factor(rep(seq(n/5), each = 5)))
    if (round_times) bigdata$time <- round(bigdata$time)
    bigdata$times <- numFactor(bigdata$time)
    bigdata$a <- simulate_new(RHSForm(form, as.form = TRUE),
                              ## show_pars = TRUE,
                              newdata = bigdata,
                              newparams = list(beta = c(35, 100, 1),
                                               betad = 1,
                                               theta = c(1,1)))[[1]]
    bigdata
}

nvec <- seq(50, 1500, by = 50)
pb <- txtProgressBar(max = length(nvec), style = 3)
elapsed <- rep(NA, length(nvec))
for (i in seq_along(nvec)) {
    setTxtProgressBar(pb, i)
    elapsed[i] <- system.time(simfun(nvec[i]))[["elapsed"]]
}
close(pb)

plot(nvec, elapsed, log = "xy")
lm(log(elapsed) ~ log(nvec))

elapsed_rnd <- n_unique <- rep(NA, length(nvec))

for (i in seq_along(nvec)) {
    setTxtProgressBar(pb, i)
    elapsed_rnd[i] <- system.time(res <- simfun(nvec[i], round_times = TRUE))[["elapsed"]]
    n_unique[i] <- length(unique(res$time))
}
close(pb)
lm(log(elapsed_rnd) ~ log(n_unique))
plot(n_unique, elapsed_rnd, log = "xy")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Thanks very much Ben! Super quick! I'm just trying your second suggestion now with the real data (which is thankfully larger than the subset I gave, with ~1500 rows). Should it take a long time to compute? It's been running for over 30 minutes, when nlme would take a few seconds. – pajul Apr 25 '23 at 21:05
  • Update: I haven't been able to get either version to run with the real data - I left the second going overnight just in case, but it just seems to hang R. I updated to the latest R (4.3.0) and glmmTMB (1.1.7), but no change. I do now get a warning about TMB being 1.9.4 when it should be 1.9.3, but I'm guessing that isn't the problem. – pajul Apr 26 '23 at 11:29
  • I simulated an example (see above). I couldn't figure out how `testforid` differed from `id` since they're synonymous in the example you gave, so I used just `id` (I don't think that will make a big difference though) – Ben Bolker Apr 26 '23 at 14:51
  • See edit about switching from `exp()` to `ou()` ... – Ben Bolker Apr 26 '23 at 22:36
  • 1
    Even with rounding down, I ended up stopping the exp() version after a few hours. Using ou() works in seconds, however. Gives me something to work with - thanks again! – pajul Apr 27 '23 at 01:28