0

In the context of testing linear hypotheses in factorial models with categorical variables and interactions, practitioners often encounter difficulties due to reference levels. Some opt to set new reference levels using functions like relevel() or factor() and just re-run the model and draw inferences about particular (new) differences from that new model's summary. This is simply not an optimal way, although, arguably, it is easier to understand. On the other hand, functions like linearHypothesis() in the car package and hypothesis() in the brms package are so very helpful, but the issue seems to persist and many simply avoid using them.

Here is a simple generated dataset:

# uniformly generated means
m = runif(8, 3, 12)

# random values, random means, same std. dev.
y = list()
for (i in 1:8) {
    y[[i]] = rnorm(50, m[i], 0.75)
}
y = unlist(y)

# factorial design
F1 = c(rep('a', 50), rep('b', 50))
F2 = c('m', 'n')
F3 = c('x', 'y')
design_matrix = expand.grid(F1=F1, F2=F2, F3=F3)

# dataset
dat = cbind(design_matrix, y)

This a summary table from brm() output:

library(brms)

summary(brm_model <- brm(y ~
    (F1 + F2) * F3,
    data = dat,
    chains=1, iter=800)) # just to converge shortly

To test the difference between F1=b across two conditions of the other factor, F3 (x vs. y), I currently use the following explicit approach:

hy1 = '(Intercept+F1b) < (Intercept+F1b:F3y)'
hypothesis(brm_model, hy1)

Specifically, I include the Intercept term to keep in mind that the Estimate values are relative to the Intercept. The higher-order effects (i.e., interactions) are also direct adjustments to that reference level. Thus, one should not add any respective lower-level effects (e.g., main effects), like (Intercept+F3y+F1b:F3y).


In the Frequentist world, the analogous model would be:

summary(lm_model <- lm(y ~
    (F1 + F2) * F3,
    data = dat))

With linear hypothesis test:

library(car)

linearHypothesis(lm_model, c(0,-1,0,0,1,0))

(Note that as the Intercept ought to be added to both sides of the specific contrasts, it gets cancelled: -1+1=0; corresponding to the first 0 in c(0,-1,0,0,1,0).)

Is this reasoning correct, and how would you explain the process of making comparisons and testing linear hypotheses in this context?

Thanks!

striatum
  • 1,428
  • 3
  • 14
  • 31
  • I can't tell if this is a programming question or a statistics question. If it's. a programming question, it's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. If it's more of a theoretical modeling question, you should ask at [stats.se] instead. – MrFlick Jul 19 '23 at 14:25
  • Thanks [MrFlick](https://stackoverflow.com/users/2372064/mrflick), I've worked out the full toy example. I believe it is a programming question, and specifically related to mentioned `R` packages. – striatum Jul 20 '23 at 10:12

0 Answers0