Get 95% confidence interval with glm(..) in R

Question

Here are some data

dat = data.frame(y = c(9,7,7,7,5,6,4,6,3,5,1,5), x = c(1,1,2,2,3,3,4,4,5,5,6,6), color = rep(c('a','b'),6))

and the plot of these data if you wish

require(ggplot)
ggplot(dat, aes(x=x,y=y, color=color)) + geom_point() + geom_smooth(method='lm')

When running a model with the function MCMCglmm()…

require(MCMCglmm)
summary(MCMCglmm(fixed = y~x/color, data=dat))

I get the lower and upper 95% interval for the estimate allowing me to know if the two slopes (color = a and color = b) are significantly different.

When looking at this output...

summary(glm(y~x/color, data=dat))

... I can't see the confidence interval!

My question is:

How can I have these lower and upper 95% interval confidence for the estimates when using the function glm()?

There is a `confint` function in the **MASS** package, I believe. — joran, Dec 16 '13 at 20:33
...also, be careful judging statistical significance by comparing confidence intervals. You might miss some significant results. — joran, Dec 16 '13 at 20:36
In this case, they work: just look at whether the confidence bounds for the "x:colorb" coefficient overlap 0. — alex keil, Dec 16 '13 at 20:41
@alexkeil Yes, but the glm method is in **MASS**, as noted in `?confint`. — joran, Dec 16 '13 at 20:44
@joran - Good read. Note that these are "profile likelihoods", which have nice properties, but may not agree with other packages. Using lm, as in " mod = lm(formula = y ~ x/color, data = dat)" and "confint(mod)" will give results under the normality assumption for the same model. — alex keil, Dec 16 '13 at 20:49
edit: "profile likelihood confidence bounds" instead of "profile likelihoods" — alex keil, Dec 16 '13 at 21:35

alex keil · Accepted Answer · 2013-12-16T21:48:39.937

22

use confint


mod = glm(y~x/color, data=dat)
summary(mod)
Call:
glm(formula = y ~ x/color, data = dat)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.11722  -0.40952  -0.04908   0.32674   1.35531  

Coefficients:
            Estimate Std. Error t value     Pr(>|t|)
(Intercept)   8.8667     0.4782  18.540 0.0000000177
x            -1.2220     0.1341  -9.113 0.0000077075
x:colorb      0.4725     0.1077   4.387      0.00175

(Dispersion parameter for gaussian family taken to be 0.5277981)

    Null deviance: 48.9167  on 11  degrees of freedom
Residual deviance:  4.7502  on  9  degrees of freedom
AIC: 30.934

Number of Fisher Scoring iterations: 2

confint(mod)
Waiting for profiling to be done...
                 2.5 %     97.5 %
(Intercept)  7.9293355  9.8039978
x           -1.4847882 -0.9591679
x:colorb     0.2614333  0.6836217

edited Dec 16 '13 at 21:48

answered Dec 16 '13 at 20:34

alex keil

1,001
7
14

Why is no one asking whether there is justification for not including colorb as a main effect? – IRTFM Dec 16 '13 at 21:12
@DWin I'd like to compare how the levels of `color` interact with the effect of `x` on `y`. I am only interested in comparing slopes. I don't really care about the overall effect of `color`. Does it sound like a good justification? – Remi.b Dec 16 '13 at 21:31
1

Not to me. Unless you offer evidence that color alone has no effect, then you are _reducing_ your power to study an interaction by constraining the color main effect to be zero. – IRTFM Dec 16 '13 at 21:34
@Remi.b, look up "Interaction contrast ratio" for a better way to address the statistical significance (or, even better, estimation) of the degree to which there is interaction in your data (in certain models). – alex keil Dec 16 '13 at 21:39
Hi, can someone please interpret the output of the `confit()` function? Since the default value of parameter `level` is `0.95` why does it output a matrix with columns for 2.5% and 97.5%? – Drubio Dec 19 '19 at 10:36
1

@Drubio 1-.95 =.0.05, which corresponds to 5% of the distribution. Because you want a two tailed confidence limit you divide the .05 in half and look at where it cuts but bottom 2.5% and top 2.5% of the distribution. If you look a a picture of the distribution, you would scan from left to right you cut off the lowest 2.5% of the distribution and when you get to 97.5% you cut again and take everything to the right. – itsMeInMiami Jul 20 '20 at 11:17

score 5 · Answer 2 · answered Dec 16 '13 at 21:11

5

@alex's approach will get you the confidence limits, but be careful about interpretation. Since glm is fundamentally a non-liner model, the coefficients usually have large covariance. You should at least take a look at the 95% confidence ellipse.

mod <- glm(y~x/color, data=dat)
require(ellipse)
conf.ellipse <- data.frame(ellipse(mod,which=c(2,3)))
ggplot(conf.ellipse, aes(x=x,y=x.colorb)) + 
  geom_path()+
  geom_point(x=mod$coefficient[2],y=mod$coefficient[3], size=5, color="red")

Produces this, which is the 95% confidence ellipse for x and the interaction term.

Notice how the confidence limits produced by confint(...) are well with the ellipse. In that sense, the ellipse provides a more conservative estimate of the confidence limits.

answered Dec 16 '13 at 21:11

jlhoward

58,004
7
97
140

There are a lot of ways to come up with more conservative confidence bounds. What are the methods used in the ggplot package? That's unclear. Profile bounds have really nice properties and are not subject to the shortcoming that you noted. They are often used in non-linear models. Since the original model used GLM with ID link and gaussian distribution, there is no need to worry about the "fundamental non-linearity." – alex keil Dec 16 '13 at 21:33
`ggplot` is just used for plotting. `ellipse(...)` comes from the ellipse package. From the plot, it is clear that `x` and `x:colorb` are strongly correlated. – jlhoward Dec 16 '13 at 21:36
The coefficients are correlated, but I don't understand why that is an argument against testing whether a single coefficient is 0. – alex keil Dec 16 '13 at 21:43
1

In this case, neither the horizontal ("x") or vertical ("x:colorb") axis crosses the ellipse, so you can assert that both coefficients are different from 0 at p=0.05. If one or the other axis *did* cross the ellipse, but the limits from confint(...) did not corss 0, then I would be reluctant to make that assertion. As I understand it, confint(...) produces limits based on the *assumption* that covariance is 0. – jlhoward Dec 16 '13 at 21:51
That's a joint test of two coefficients - how does that better answer the OP's question than a single coefficient? – alex keil Dec 16 '13 at 22:03

Get 95% confidence interval with glm(..) in R

2 Answers2