0

I'm trying to make a multilevel mediation analysis (as done here and here).

library(data.table)
library(lme4)
library(nlme)
library(magrittr)
library(dplyr)

set.seed(1)

# Simulated data ------------------------------------------------------------------
dt_1 <- data.table(id = rep(1:10, each=4),
                   time = as.factor(rep(1:4, 10)),
                   x = rnorm(40),
                   m = rnorm(40),
                   y = rnorm(40))

# Melt m and y into z ------------------------------------------------------------------
dt_2 <- dt_1 %>%
  mutate(mm = m, .after=x) %>%
  melt(id.vars = c("id","time","x","mm"),
       na.rm = F,
       variable.name = "dv",
       value.name = "z") %>%
  within({
    dy <- as.integer(dv == "y")
    dm <- as.integer(dv == "m")
    }) %>%
  arrange(id,time)

> head(dt_2,4)
   id time          x         mm dv          z dm dy
1:  1    1 -0.6264538 -0.1645236  m -0.1645236  1  0
2:  1    1 -0.6264538 -0.1645236  y -0.5686687  0  1
3:  1    2  0.1836433 -0.2533617  m -0.2533617  1  0
4:  1    2  0.1836433 -0.2533617  y -0.1351786  0  1

# lme mediation model ------------------------------------------------------------------
model_lme <- lme(fixed = z ~ 0 + 
                             dm + dm:x + dm:time + #m as outcome
                             dy + dy:mm + dy:x + dy:time, #y as outcome
                 random = ~ 0  +  dm:x + dy:mm + dy:x | id, 
                 weights = varIdent(form = ~ 1 | dm), #separate sigma^{2}_{e} for each outcome
                 data = dt_2,
                 na.action = na.exclude)

Error in MEEM(object, conLin, control$niterEM): Singularity in backsolve at level 0, block 1

# lmer mediation model ------------------------------------------------------------------
model_lmer <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
                     (0  +  dm:x + dy:mm + dy:x | id) + (0 + dm | time), 
                  data = dt_2,
                  na.action = na.exclude)

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

I've seen some posts about these error (nlme) / warning (lme4) (eg, this and this), but I didn't figure out what is the problem here.

I checked

X <- model.matrix(~0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time, data=dt_2)

> caret::findLinearCombos(X)
$linearCombos
$linearCombos[[1]]
[1] 7 1 4 5 6

$remove
[1] 7

but I don't quite understand the output.

From the summary of model_lmer I verify that dm:time4 and time1:dy coefficients are missing, but why? There are all the possible combinations (0/0, 0/1, 1/0, 1/1) in the dataset.

Fixed effects:
         Estimate Std. Error t value
dm        0.30898    0.92355   0.335
dy        0.03397    0.27480   0.124
dm:x      0.21267    0.19138   1.111
dm:time1 -0.19713    1.30583  -0.151
dm:time2 -0.30206    1.30544  -0.231
dm:time3 -0.20828    1.30620  -0.159
dy:mm     0.22625    0.18728   1.208
x:dy     -0.37747    0.17130  -2.204
time2:dy  0.29894    0.39123   0.764
time3:dy  0.22640    0.39609   0.572
time4:dy -0.16758    0.39457  -0.425

On the other hand, using time as numeric gives no error/warning:

# lmer mediation model - time as numeric
model_lmer2 <- lmer(z ~ 0 + dm + dm:x + dm:time + dy + dy:mm + dy:x + dy:time +
                     (0  +  dm:x + dy:mm + dy:x | id) + (0 + dm | time), 
                  data = within(dt_2, time <- as.numeric(time)),
                  na.action = na.exclude)

It's true that one can know dm from dy (if one is 1 the other is 0), but if that was the problem, this last model (model_lmer2) would still give a warning, I guess.

In my real dataset, I could eventually use time as numeric (although not my first approach), but I would like to understand what the problem is with using it as categorical.

Thank you!

MDSF
  • 123
  • 6

1 Answers1

1

This is really a generic question about linear model construction/formulas in R: it's not mixed-model specific.

Let's look at the names of the columns involved in the multicollinear combination of variables (i.e. columns 7, 1, 4, 5, 6):

cc <- caret::findLinearCombos(X)
colnames(X)[cc$linearCombos[[1]]]
## [1] "dm:time4" "dm"       "dm:time1" "dm:time2" "dm:time3"

This is telling us that the main effect of dm is confounded with the dm:time interaction; once we know dm:time[i] for all levels of i, the main effect of dm is redundant.

It's too bad that lme doesn't automatically drop columns to adjust for multicollinearity as lmer does, and that lmer doesn't have a super-convenient way to model heteroscedasticity à la varIdent(); it is possible but it's a nuisance. It would be possible to build the auto-dropping into nlme, or glmmTMB (which can also model heteroscedasticity easily), but no-one's gotten around to it yet).

... if you're OK with specifying dm:time and leaving dm out of your model then that might be easiest!


You can experiment with what happens with different model specifications:

lc <- function(f) {
    X <- model.matrix(f, dt_2)
    cc <- caret::findLinearCombos(X)
    lapply(cc$linearCombos, function(z) colnames(X)[z])
}

lc(~0 + dm + dm:time)
lc(~0 + dy + dy:time)
lc(~0 + dm + dm:time + dy + dy:time)
lc(~0 + dy + dy:time + dm + dm:time)

or similar stuff looking at the (heads of) the model matrices, column names of the model matrices, etc..

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • I'm probably missing something very easy, but in that case 1) doesn't `caret::findLinearCombos(X)` should also identify `dy` and `dy:time` as collinear? and 2) how are these interactions different from all other categorical-categorical interactions in any model? It's because I have `dy` and `dm=1-dy` and their interactions with `time` in the model? Thank you, @BenBolker! – MDSF Dec 30 '21 at 09:59
  • I'm not sure, I'd have to spend more time looking at model matrices and thinking about it. As you can see from the examples I gave above, `dy` and `dy:time` *are* collinear if they're in a model alone. For reasons I don't understand, `head(model.matrix(~0 + dy + dy:time + dm + dm:time, dt_2))` flips the order of the names for `dm:time` (`timex:dm`) *and* automatically drops the `time1:dm` column ... ? – Ben Bolker Dec 30 '21 at 18:14
  • Thanks for your help @BenBolker. After some tries, I got the model to work by creating one binary variable (0/1) for each level of time and also remove `dm` and `dy` as you suggested. Even for inclusing of binary factors (other covariates), I had to recode them to numeric 0/1 – MDSF Jan 14 '22 at 15:02