0

I have a dataset on students in classrooms. After multiple multilevel imputation with mice (pooling with mitml), I have 20 datasets. I now want to apply multilevel regression. The "normal" regressions work fine, but as soon as I include interaction terms I don't understand the output anymore. Example:

I want to calculate the effect of the interaction between mean achievement (meanmath, L2) and classroom climate (cc, L2) on individual achievement (math, L1). The equation looks like this:

Int1 <- with(data, lmer(math ~ meanmath*cc + (1|classID)))

In the output I now get the following estimates:

(Intercept) 0.34
meanmath    0.22
cc1        -0.43
cc2        -0.69
cc3        -0.66
meanmath*cc1 -0.16
meanmath*cc2 0.12 
meanmath*cc3 0.23

These cc1-3 variables do not exist in my dataset, neither in the original one, nor in the imputed ones. Could maybe someone tell me how I could find where these variables come from?

I tried to run the equation with just one of the imputed datasets -> Same thing happened I made sure that there are the same variables in all imputed datasets -> This is the case


Details from comment:

data$cc is a Factor w/ 4 Levels: "1", "2", "3", "4". Otherwise the variables are all continuous.

user20650
  • 24,654
  • 5
  • 56
  • 91
Svenja
  • 21
  • 1
  • What does `str(data$cc)` return please? – user20650 Jun 02 '23 at 17:19
  • Hi! This returns Factor w/ 4 Levels: "1", "2", "3", "4". Otherwise the variables are all continuous. – Svenja Jun 02 '23 at 17:22
  • 2
    Thanks. Nothing unexpected here ... this is how characters / factors (categorical) terms are represented in a regression model. I'll try to find a link that explains. – user20650 Jun 02 '23 at 17:27
  • perhaps https://stackoverflow.com/questions/36555639/how-do-regression-models-deal-with-the-factor-variables ; https://stackoverflow.com/questions/15231837/why-does-regression-in-r-delete-index-1-of-a-factor-variable – user20650 Jun 02 '23 at 17:33
  • and maybe useful https://stats.stackexchange.com/questions/149621/how-should-i-implement-this-interaction-between-a-continuous-and-categorical-pre ; https://stats.stackexchange.com/questions/274748/interaction-terms-categorical-continuous ; – user20650 Jun 02 '23 at 17:37
  • 1
    Yes, because `cc` has four discrete levels, you need three parameters to estimate the effect of `cc` on `math`. It seems like `cc=4` is your reference level, so `cc1` can be interpreted as the difference in individual math achievement between climate 1 and climate 4. After accounting for the other predictors, climate 1 has an expected value of individual math achievement -0.43 less than climate 4. The same logic applies to the interaction term. – qdread Jun 02 '23 at 17:42
  • Alright! Got it, that solved a problem I had been thinking about for days now (although it seems to be rather a statistical then an r issue). Thank you so much!! – Svenja Jun 02 '23 at 17:44

1 Answers1

2

Thanks to the very helpful comments I now understood that this is not an R problem, but rather a statistical one. If the interaction term includes a factor, the estimates show the effects of factor levels in comparison to a reference level. In the estimates I posted above, cc4 is the reference level. This means that e.g. students in classrooms with climate 3 (cc3) have an expected value of -0.66 compared to students in a classroom with climate 4 (cc4). Again a big thank you to all the contributors of the comments that helped me to understand this!

Svenja
  • 21
  • 1