Why is converting emmeans contrasts to a data.frame not reporting correct p-values?

Question

The p-values for the contrasts I am running are not being converted correctly to a data.frame. Why is this and how do I fix it?

Console output for emmeans:

> pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
$`simple contrasts for Status`
Stim = 1, Treatment = None:
 contrast               estimate     SE   df t.ratio p.value
 Control - Subclinical  -0.24213 0.0571 57.5 -4.241  0.0002 
 Control - Clinical     -0.16275 0.0571 57.5 -2.851  0.0164 
 Subclinical - Clinical  0.07938 0.0571 57.5  1.390  0.3526

Console output for data.frame of emmeans:

> mod.EMM <- pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
> as.data.frame(mod.EMM)
   Stim Treatment      Status               contrast     estimate         SE       df      t.ratio      p.value
1    1      None           .  Control - Subclinical -0.242125000 0.05709000 57.46544  -4.24111052 3.680551e-03
2    1      None           .     Control - Clinical -0.162750000 0.05709000 57.46544  -2.85076195 2.721389e-01
3    1      None           . Subclinical - Clinical  0.079375000 0.05709000 57.46544   1.39034857 1.000000e+00

Reproducible example:

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)

library(emmeans)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
# 
# Treatment = chilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
# 
# 
# $`simple contrasts for Treatment`
# Type = Quebec:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
# 
# Type = Mississippi:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled    10.14 1.85 79 5.477   <.0001

as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
#    Treatment        Type             contrast  estimate       SE df  t.ratio      p.value
# 1 nonchilled           . Quebec - Mississippi  9.380952 1.851185 79 5.067538 1.036140e-05
# 2    chilled           . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3          .      Quebec nonchilled - chilled  3.580952 1.851185 79 1.934410 2.265719e-01
# 4          . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
# 
# Treatment = chilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
# 
# 
# $`simple contrasts for Treatment`
# Type = Quebec:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
# 
# Type = Mississippi:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled    10.14 1.85 79 5.477   <.0001

as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
#    Treatment        Type             contrast  estimate       SE df  t.ratio      p.value
# 1 nonchilled           . Quebec - Mississippi  9.380952 1.851185 79 5.067538 1.036140e-05
# 2    chilled           . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3          .      Quebec nonchilled - chilled  3.580952 1.851185 79 1.934410 2.265719e-01
# 4          . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06

Update from outside help:

"It seems like the result of pairs() is not itself an emmGrid object that can be turned into a data frame, but a list containing two emmGrid objects. If you extract either of those objects by position from the list, using [[]], like so,

pairs(emmeans(model1, ~ Type*Treatment), simple = "each")[[2]]

then you can data.frame() each result and it will be correct. You end up with two different dataframes to hold the contrasts involving the two different variables, but each of these dataframes has the correct p-values in it."

I'm hoping someone has a better work-around for this issue so I can combine all the contrasts into one data.frame.

xilliam · Accepted Answer · 2021-03-14T21:46:02.143

The different p-values you are seeing reflect unadjusted p-values vs p-values that were adjusted for multiple comparisons.

The ?emmeans::pairs documentation tells us:

Ordinarily, when simple is a list or "each", the return value is an emm_list object with each entry in correspondence with the entries of simple. However, with combine = TRUE, the elements are all combined into one family of contrasts in a single emmGrid object using rbind.emmGrid.. In that case, the adjust argument sets the adjustment method for the combined set of contrasts.

So, with your reproducible example, you can combine all the simple main effects into one data frame with the combine argument set to TRUE. And you can choose between unadjusted vs adjusted p-values by setting the adjust argument.

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)

> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+               adjust = "none")
 Treatment  Type        contrast             estimate   SE df t.ratio p.value
 nonchilled .           Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
 chilled    .           Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
 .          Quebec      nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
 .          Mississippi nonchilled - chilled    10.14 1.85 79 5.477   <.0001

Here's one with Bonferroni adjustment:

> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+               adjust = "bonferroni")
 Treatment  Type        contrast             estimate   SE df t.ratio p.value
 nonchilled .           Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
 chilled    .           Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
 .          Quebec      nonchilled - chilled     3.58 1.85 79 1.934   0.2266 
 .          Mississippi nonchilled - chilled    10.14 1.85 79 5.477   <.0001 

P value adjustment: bonferroni method for 4 tests

So in my actual data, I have a three-way contrast and using combine = TRUE with adjust = "Tukey" does not give the same console output as the pairs() contrast (which states it uses Tukey) without the combine = TRUE. Does this mean it's making a different adjustment due to counting more contrasts? — iastatecy, Mar 14 '21 at 21:25
To answer my own question - yes, it is. Combine = TRUE results in "tukey method for comparing family of 10 estimates" whereas without it it adjusts each contrast group by 3 estimates each. — iastatecy, Mar 14 '21 at 21:27

Russ Lenth · Answer 2 · 2021-03-16T04:56:20.400

This can be done pretty easily, but what you have to do is get the basic output and then plug in the right P values. To illustrate, I'm going to show a different example where one factor has more than two levels.

require(emmeans)
#> Loading required package: emmeans

warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
(cons = pairs(emmeans(warp.lm, ~ wool * tension), simple = "each"))
#> $`simple contrasts for wool`
#> tension = L:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B       16.33 5.16 48  3.167  0.0027 
#> 
#> tension = M:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B       -4.78 5.16 48 -0.926  0.3589 
#> 
#> tension = H:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B        5.78 5.16 48  1.120  0.2682 
#> 
#> 
#> $`simple contrasts for tension`
#> wool = A:
#>  contrast estimate   SE df t.ratio p.value
#>  L - M      20.556 5.16 48  3.986  0.0007 
#>  L - H      20.000 5.16 48  3.878  0.0009 
#>  M - H      -0.556 5.16 48 -0.108  0.9936 
#> 
#> wool = B:
#>  contrast estimate   SE df t.ratio p.value
#>  L - M      -0.556 5.16 48 -0.108  0.9936 
#>  L - H       9.444 5.16 48  1.831  0.1704 
#>  M - H      10.000 5.16 48  1.939  0.1389 
#> 
#> P value adjustment: tukey method for comparing a family of 3 estimates

# get the estimates, etc. into a data frame:
df = as.data.frame(cons)

# get the Tukey-adjusted P values:
pv = unlist(lapply(unlist(cons), function(x) as.data.frame(x)$p.value))

# replace the p values and display
df$p.value = pv
df
#>   tension wool contrast   estimate       SE df    t.ratio      p.value
#> 1       L    .    A - B 16.3333333 5.157299 48  3.1670322 0.0026768025
#> 2       M    .    A - B -4.7777778 5.157299 48 -0.9264108 0.3588672592
#> 3       H    .    A - B  5.7777778 5.157299 48  1.1203107 0.2681556374
#> 4       .    A    L - M 20.5555556 5.157299 48  3.9857208 0.0006572745
#> 5       .    A    L - H 20.0000000 5.157299 48  3.8779987 0.0009185485
#> 6       .    A    M - H -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 7       .    B    L - M -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 8       .    B    L - H  9.4444444 5.157299 48  1.8312771 0.1703517915
#> 9       .    B    M - H 10.0000000 5.157299 48  1.9389993 0.1388570254

^{Created on 2021-03-15 by the reprex package (v1.0.0)}

The method with combine = TRUE will not work for anything except adjust = "none", because the family size is that of all the contrasts combined. Moreover, the Tukey method can only be applied to a single set of pairwise comparisons. Two or more sets of pairwise comparisons combined do not comprise a set of pairwise comparisons, so cannot be adjusted using the Tukey method.

I still don't recommend doing this if the goal is to present the results to someone else; because looking at this one data frame makes it extremely unclear how the P-value adjustments were done, and to which families. We have six families of comparisons in this example; the original, annotated display of cons makes this clear, and the listing of df does not.

Why is converting emmeans contrasts to a data.frame not reporting correct p-values?

Reproducible example:

2 Answers2

Linked