In the context of testing linear hypotheses in factorial models with categorical variables and interactions, practitioners often encounter difficulties due to reference levels. Some opt to set new reference levels using functions like relevel()
or factor()
and just re-run the model and draw inferences about particular (new) differences from that new model's summary. This is simply not an optimal way, although, arguably, it is easier to understand. On the other hand, functions like linearHypothesis()
in the car
package and hypothesis()
in the brms
package are so very helpful, but the issue seems to persist and many simply avoid using them.
Here is a simple generated dataset:
# uniformly generated means
m = runif(8, 3, 12)
# random values, random means, same std. dev.
y = list()
for (i in 1:8) {
y[[i]] = rnorm(50, m[i], 0.75)
}
y = unlist(y)
# factorial design
F1 = c(rep('a', 50), rep('b', 50))
F2 = c('m', 'n')
F3 = c('x', 'y')
design_matrix = expand.grid(F1=F1, F2=F2, F3=F3)
# dataset
dat = cbind(design_matrix, y)
This a summary table from brm()
output:
library(brms)
summary(brm_model <- brm(y ~
(F1 + F2) * F3,
data = dat,
chains=1, iter=800)) # just to converge shortly
To test the difference between F1=b across two conditions of the other factor, F3 (x vs. y), I currently use the following explicit approach:
hy1 = '(Intercept+F1b) < (Intercept+F1b:F3y)'
hypothesis(brm_model, hy1)
Specifically, I include the Intercept
term to keep in mind that the Estimate
values are relative to the Intercept
. The higher-order effects (i.e., interactions) are also direct adjustments to that reference level. Thus, one should not add any respective lower-level effects (e.g., main effects), like (Intercept+F3y+F1b:F3y)
.
In the Frequentist world, the analogous model would be:
summary(lm_model <- lm(y ~
(F1 + F2) * F3,
data = dat))
With linear hypothesis test:
library(car)
linearHypothesis(lm_model, c(0,-1,0,0,1,0))
(Note that as the Intercept
ought to be added to both sides of the specific contrasts, it gets cancelled: -1+1=0
; corresponding to the first 0
in c(0,-1,0,0,1,0)
.)
Is this reasoning correct, and how would you explain the process of making comparisons and testing linear hypotheses in this context?
Thanks!