1

Setting aside the debate about Type III ANOVA and the Principle of Marginality and all that...

I've set up two models whose sum of squares should be different (and Type III ANOVA would test that difference). Here's the code:

library(car)
library(openintro)
data(hsb2)
hsb2$gender <- factor(hsb2$gender)
contrasts(hsb2$gender) <- "contr.sum"
contrasts(hsb2$ses) <- "contr.sum"
math_gender_int <- lm(math ~ gender + gender:ses, data = hsb2)
math_gender_ses_int <- lm(math ~ gender + ses + gender:ses, data = hsb2)

Now I should be able to see a difference in the sum of squares between these two models. After all, the "full" model has one more term in it:

anova(math_gender_int, math_gender_ses_int)

But the output shows this:

Analysis of Variance Table

Model 1: math ~ gender + gender:ses
Model 2: math ~ gender + ses + gender:ses
  Res.Df   RSS Df  Sum of Sq F Pr(>F)
1    194 15858                       
2    194 15858  0 -1.819e-12      

What's going on here?

Sean Raleigh
  • 579
  • 4
  • 10
  • If ses and gender are both factor variables, then `lm` turns these into dummy variables in the model. When two categorical variables are interacted, and every interacted set is observed, then the main effects of the categoricals are irrelevant as the interaction "saturates" the model. – lmo Mar 31 '17 at 17:26
  • It's unclear to me why these two models should have the same number of coefficients when one model is clearly nested inside the other model. – Sean Raleigh Mar 31 '17 at 17:27
  • @lmo: I posted at the same time you did. I see what you're saying. Is this an R thing or a stats thing? It seems that I should be able to "manually" check that Type III ANOVA results this way. But if R saturates the model with the interaction term, I'm not sure how else to go about it. – Sean Raleigh Mar 31 '17 at 17:33
  • This is a math/stats thing. Consider two binary variables: color {"blue", "red"} and speed {"fast", "slow"}. The interaction would contain four levels blue:fast, blue:slow, red:fast, red:slow. If these interaction terms are included, then there is no remaining variation to identify color and speed by themselves. The interaction variables cover every possibility. – lmo Mar 31 '17 at 17:40
  • 1
    In Fox's notation, I am looking for SS(alpha | beta, gamma) = SS(alpha, beta, gamma) - SS(beta, gamma). From what you're saying, though, there would never be a difference between SS(alpha, beta, gamma) and SS(beta, gamma). – Sean Raleigh Mar 31 '17 at 17:46
  • 1
    @李哲源ZheyuanLi: "Don't just judge from question title." I'm not sure what you mean. The question title indicates that I'm trying to check the results of Type III ANOVA using incremental F-tests that compare what I thought were two completely different (but nested) models. I take your point about R not being able to do it directly without manually modifying the model matrix. I'll try that. Thanks. – Sean Raleigh Mar 31 '17 at 17:52
  • 1
    @李哲源ZheyuanLi: Dropping the columns from the model matrix worked like a charm. Thanks! – Sean Raleigh Apr 03 '17 at 17:02

0 Answers0