15

I'm dealing with an unbalanced design/sample and originally learned aov(). I know now that for my ANOVA tests I need to use the Type III Sum of Squares which involves using fitting using lm() rather than using aov().

The problem is getting post-hoc tests (specifically Tukey's HSD) using lm(). All the research I've done has said that using simint in the multcomp package would work, but now that it's updated that command seems to not be available. It also seems to rely upon going through aov() to calculate.

Essentially all of the Tukey HSD tests I've found for R assume that you use aov() for the comparison rather than lm(). To get the Type III Sum of Squares I need for the unbalanced design I have to use:

mod<-lm(Snavg~StudentEthnicity*StudentGender)

Anova(mod, type="III")

How do I use a Tukey HSD test with my mod using lm()? Or conversely, calculate my ANOVA using Type III and still be able to run a Tukey HSD test?

Thanks!

zx8754
  • 52,746
  • 12
  • 114
  • 209
leighadlr
  • 159
  • 1
  • 1
  • 5

4 Answers4

9

Try HSD.test in agricolae

library(agricolae)
data(sweetpotato)
model<-lm(yield~virus, data=sweetpotato)
comparison <- HSD.test(model,"virus", group=TRUE,
main="Yield of sweetpotato\nDealt with different virus")

Output

Study: Yield of sweetpotato
Dealt with different virus

HSD Test for yield 

Mean Square Error:  22.48917 

virus,  means

      yield  std.err replication
cc 24.40000 2.084067           3
fc 12.86667 1.246774           3
ff 36.33333 4.233727           3
oo 36.90000 2.482606           3

alpha: 0.05 ; Df Error: 8 
Critical Value of Studentized Range: 4.52881 

Honestly Significant Difference: 12.39967 

Means with the same letter are not significantly different.

Groups, Treatments and means
a        oo      36.9 
ab       ff      36.33333 
 bc      cc      24.4 
  c      fc      12.86667 
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
  • I'm attempting to use this package/command with my data: `HSD.test(mod, group=TRUE, main= "SN Average by ethnicity & gender")` but I'm still getting an error: `Error in as.character(x) : 'x' is missing`. Looking at the output, though, it doesn't seem to match the reporting of p-values that you get from TukeyHSD. I'll keep trying and see if I can find out what's going wrong. Thanks! – leighadlr Oct 12 '11 at 15:16
  • I found that question and i dont't unterstand which groups are now siginificantly different. Could you explain the example little bit more in detail? – bladepit May 10 '13 at 07:52
  • 1
    weird behavious of hsd.test function since if you don't assign it to a variable, it prints nothing. Can be confusing the first time. – agenis Jul 30 '14 at 18:29
  • @agenis if you set console to true as suggested by Sollano it will print out without saving the data to a variable – Simone Jan 02 '18 at 12:33
  • This solution is for an experimental design. If your design is different have a look at the reference manual -> group needs to be set to F for non-experimental design for example – Simone Jan 02 '18 at 12:38
6

As an initial note, unless it's been changed, to get the correct results for type iii sum of squares, you need to set the contrast coding for the factor variables. This can be done inside the lm call or with options. The example below uses options.

I would be cautious about using HSD.test and similar functions with unbalanced designs unless the documentation addresses their use in these situations. The documentation for TukeyHSD mentions that it adjusts for "mildly unbalanced" designs. I don't know if HSD.test handles things differently. You'd have to check additional documentation for the package or the original reference cited for the function.

As a side note, enclosing the whole HSD.test function in parentheses will cause it to print the results. See example below.

In general, I would recommend using the flexible emmeans (née lsmeans) or multcomp packages for all your post-hoc comparison needs. emmeans is particularly useful for doing mean separations on interactions or for examining contrasts among treatments. [EDIT: Caveat that I am the author of these pages.]

With an unbalanced design, you may want to report the E.M. (or L.S.) means instead of the arithmetic means. See SAEPER: What are least square means?. [EDIT: Caveat that I am the author of this page.] Note in the example below that the marginal means reported by emmeans are different than those reported by HSD.test.

Also note that the "Tukey" in glht has nothing to do with Tukey HSD or Tukey-adjusted comparisons; it just sets up the contrasts for all pairwise tests, as the output says.

However, the adjust="tukey" in emmeans functions does mean to use Tukey-adjusted comparisons, as the output says.

The following example is partially adapted from ARCHBS: One-way Anova.

### EDIT: Some code changed to reflect changes to some functions
###  in the emmeans package

if(!require(car)){install.packages("car")}
library(car)
data(mtcars)
mtcars$cyl.f = factor(mtcars$cyl)
mtcars$carb.f = factor(mtcars$carb)

options(contrasts = c("contr.sum", "contr.poly"))

model = lm(mpg ~ cyl.f + carb.f, data=mtcars)

library(car)
Anova(model, type="III")

if(!require(agricolae)){install.packages("agricolae")}
library(agricolae)
(HSD.test(model, "cyl")$groups)

if(!require(emmeans)){install.packages("emmeans")}
library(emmeans)

marginal = emmeans(model,
                   ~ cyl.f)

pairs(marginal, adjust="tukey")

if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)

cld(marginal, adjust="tukey", Letters=letters)


if(!require(multcomp)){install.packages("multcomp")}
library(multcomp)

mc = glht(model,
          mcp(cyl.f = "Tukey"))

summary(mc, test=adjusted("single-step"))

cld(mc)
Sal Mangiafico
  • 440
  • 3
  • 8
  • Thanks for the pointer Sal! The contrast option still exists in the car package according to the reference manual. Got to work my way through it. – Simone Jan 02 '18 at 11:56
  • @Sal: `emmeans` do consider `aov`, `lm`, etc. but unfortunately not `Anova` with its 2 or 3 `type` of sum of squares. How does `emmeans` understand the correct type of SS? Or one has to run Anova with SS2 or 3 and with significant effects do `emmeans`? – abc Jan 25 '21 at 04:22
  • @stan , I don't know how to answer your question precisely. You might also see emmeans::joint_tests: https://rdrr.io/cran/emmeans/man/joint_tests.html – Sal Mangiafico Jan 25 '21 at 16:15
  • @Sal: my question was how to let `emmeans` know the SS type used in `car::Anova` ? Is it necessary? It's said that `stats::TukeyHSD` by default uses SS1 from `stats::aov` ([Multiple Comparisons](https://www.statmethods.net/stats/anova.html)). Or the correct way is just: aov or lm > Anova with aov or lm > select p<0.05 effects in Anova > emmeans with selected effects ? – abc Jan 26 '21 at 09:49
  • 1
    (Reposting comment due to bad link.) Hi, @stan. I can't give you any kind of technical --- or probably informative --- answer. I suspect that the way individual contrasts are calculated in `emmeans`, that it doesn't make sense to consider them as type I, II, or III SS. It depends on the model (`lm`) and not the anova per se. That being said, it might help a bit to read the section on "Joint Tests" [here](https://cran.r-project.org/web/packages/emmeans/vignettes/confidence-intervals.html) There it mentions how it makes sense to think of the joint tests of multiple contrasts are T-II or T-III. – Sal Mangiafico Jan 26 '21 at 23:19
3

I found HSD.test() also to be very meticulous about the way you have built either the lm() or aov() model that you're using for it.

There was no output from HSD.test() with my data when I had used following idea of coding for lm() :

    model<-lm(sweetpotato$yield ~ sweetpotato$virus)  
    out <- HSD.test(model,"virus", group=TRUE, console=TRUE)

Output was only:

    Name:  virus 
    sweetpotato$virus 

The output was equally bad when using the same logic for aov()

    model<-aov(sweetpotato$yield ~ sweetpotato$virus)

To get the output for HSD.test() the lm() (or also if using aov() for the model ) must be constructed strictly using the logic presented in the MYaseen208 answer:

    model <- lm(yield~virus, data=sweetpotato)

Hope this helps someone who's not getting a proper output from HSD.test().

hakkis
  • 41
  • 5
  • If you use aov to run your ANOVA you can also use TukeyHSD from the R stats (base) pacakge. Works really well! The reason why you want to use HSD.test from the agricolae package is because TukeyHSD doesn't work with the car package, which allows specification of the different types of SS. – Simone Jan 02 '18 at 11:46
  • Exactly, @Simone. The `HSD.test()` from the `agricolae` package has an argument specifically for unbalanced designs (`unbalanced=T`) will produce estimates assuming not equal replication. For those searching for a post-hoc test to conduct after `car`'s type III two-way ANOVA, I would recommend: `m1 <- lm(formula = dv ~ factorA:factorB, data = your_dataset); library(agricolae); out <- HSD.test(m1,c("factorA","factorB"), group=F, console=TRUE, unbalanced =T)` – Sinval Jan 08 '22 at 23:48
1

I was stuck with the same problem of the HSD.test printing out nothing. You need to put console=TRUE inside the function, so it prints out automatically.

For example:

HSD.test(alturacrit.anova, "fator", console=TRUE).
Hope it helps!
Wyetro
  • 8,439
  • 9
  • 46
  • 64
Sollano
  • 11
  • 1