4

Im interested in calculating the SE for a mix model. For that, first I have play around with one of the dataset that the package include, in a simpler model.

pigs$percent <- as.factor(pigs$percent)
Doc_lm_1 <- lm(conc~percent, pigs) 
summary(Doc_lm_1)
emmeans(Doc_lm_1, pairwise~percent)$emmeans

The output:

percent emmean   SE df lower.CL upper.CL
9         32.7 2.92 25     26.7     38.7
12        38.0 2.76 25     32.3     43.7
15        40.1 3.12 25     33.7     46.6
18        39.9 3.70 25     32.3     47.6

When I have try with balanced datasets, the SE is the same for all the groups and doesnt match a handmade SE. I guess that in that case is not ponderate it for any factor, but it still should match the handmade SE

Could be that SE is the SE of the parameter? As we can see in the table, the SE variate between groups when the data is unbalanced. I base my hypothesis in the fact that the cran project website of the package indicate (https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html#backstory):

Estimated marginal means are based on a model – not directly on data"

So I was asking me, How are the SE calculated? and how will the addition of a random factor change this calculation? Thanks in advance.

Chino Chiang
  • 91
  • 2
  • 6

1 Answers1

6

To answer the specific question, look at the summary results:

> summary(Doc_lm_1)
... several lines skipped ...
Residual standard error: 8.267 on 25 degrees of freedom
Multiple R-squared:  0.134, Adjusted R-squared:  0.03011 
F-statistic:  1.29 on 3 and 25 DF,  p-value: 0.2997

... from which we can tell that the residual SD from the model is 8.267. In addition, we need the number of observations at each factor level:

> with(pigs, tapply(conc, percent, length))
 9 12 15 18 
 8  9  7  5 

Inasmuch as the SE of a mean is the SD divided by the square root of the sample size, calculate:

> 8.267 / sqrt(c(8,9,7,5))
[1] 2.922826 2.755667 3.124632 3.697115

Lo and behold, these match the SEs shown in the emmeans() output. As is quoted in the question, emmeans() uses the model, and the model shown is based on an assumption that all four samples have the same SD, and the estimate of that common SD is 8.267 with 25 degrees of freedom. Hand calculations based on one sample at a time use separate SDs, and that is a different model than the one that was handed to emmeans(); that's why the results are different.

As to the general question of how emmeans() calculates SEs, it does not use hand-calculation formulas. It uses the fact that the EMMs are linear combinations of the regression coefficients. It finds out what combinations are needed, then it uses matrix calculations involving the regression coefficients from coef(), and the variance-covariance matrix of those coefficients, vcov(), to obtain the EMMs and their standard errors. For models involving more than one factor, random effects, etc., those calculations are nearly impossible to reproduce by hand.

Russ Lenth
  • 5,922
  • 2
  • 13
  • 21
  • PS -- the link given in the question goes to the wrong subsection of the vignette. It should be https://cran.r-project.org/web/packages/emmeans/vignettes/basics.html#EMMdef. – Russ Lenth Jan 27 '19 at 22:06
  • Thanks for your quickly anser rvl! to be honest I was waiting for your answer. – Chino Chiang Jan 28 '19 at 07:55