1

I was unsure whether this question would be more appropriate here or on Cross Validated. I hope I made the right choice.

Consider the example:

library(dplyr)
setosa <- iris %>% filter(Species == "setosa") %>% select(Sepal.Length, Sepal.Width, Species)
library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point() +
geom_smooth(method ="lm", formula = y ~ poly(x,2))

enter image description here By default, ggplot "displays confidence interval around smooth" (see here), given by the gray area around the regression curve. I've always assumed these are simultaneous confidence bands for the regression curve, not pointwise confidence bands. ggplot2 documentation refers to the predict function for details on how the standard errors are computed. However, reading the doc for predict.lm, it doesn't say explicitly that simultaneous confidence bands are computed. So, what is the correct interpretation here?

DeltaIV
  • 4,773
  • 12
  • 39
  • 86
  • I think this is better on the [stats site](http://stats.stackexchange.com/questions), as it seems little to do with ggplot, and more to do with asking how the confidence interval is calculated for a linear regression: voted to migrate – user20650 Aug 23 '16 at 21:23
  • You may find this answer useful: [How does predict.lm() compute confidence interval and prediction interval?](http://stackoverflow.com/a/38110406/6455166) – Weihuang Wong Aug 23 '16 at 23:28
  • @WeihuangWong, thanks for the help, but that link you posted discusses the difference between confidence intervals and prediction intervals. I'm not concerned with prediction intervals. I'm only interested in knowing if `predict.lm()`, when used to compute **confidence** bands, computes **simultaneous** confidence bands or **pointwise** confidence bands. – DeltaIV Aug 24 '16 at 08:34
  • @user20650, you may be right. I'm specifically asking what type of confidence band `ggplot` is computing (pointwise or simultaneous). That's why in the end I chose to post here. Anyway, it could be that CV is better suited for this question. I'll make a little edit to the title to avoid any confusion, and wait a bit more. If I don't receive any answer, I'll migrate (of course, assuming that moderators haven't already migrated the question by then). – DeltaIV Aug 24 '16 at 08:36
  • Hi Delta: as ggplot just uses the code `predict(.., interval = "confidence")` (you can model this by hand and plot to check), so it seems as if the question reduces to what this interval represents. – user20650 Aug 24 '16 at 11:30

1 Answers1

4

One way to check what predict.lm() computes is to inspect the code (predict multiplies standard errors by qt((1 - level)/2, df), and so does not appear to make adjustments for simultaneous inference). Another way is to construct simultaneous confidence intervals and compare them against predict's intervals.

Fit the model and construct simultaneous confidence intervals:

setosa <- subset(iris, Species == "setosa")
setosa <- setosa[order(setosa$Sepal.Length), ]
fit <- lm(Sepal.Width ~ poly(Sepal.Length, 2), setosa)

K <- cbind(1, poly(setosa$Sepal.Length, 2))
cht <- multcomp::glht(fit, linfct = K)
cci <- confint(cht)

Reshape and plot:

cc <- as.data.frame(cci$confint)
cc$Sepal.Length <- setosa$Sepal.Length
cc <- reshape2::melt(cc[, 2:4], id.var = "Sepal.Length")

library(ggplot2)
ggplot(data = setosa, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point() +
  geom_smooth(method ="lm", formula = y ~ poly(x,2)) +
  geom_line(data = cc, 
            aes(x = Sepal.Length, y = value, group = variable),
            colour = "red")

It appears that predict(.., interval = "confidence") does not produce simultaneous confidence intervals:

enter image description here

Weihuang Wong
  • 12,868
  • 2
  • 27
  • 48
  • Thanks! Could you add a little bit of explanation to the code, though? The `ggplot` part is fairly self-explanatory, but some comments about the lines before may make the answer more generally useful. For example, why do you need to reorder `setosa`? What is the function `glht` from package `multcomp` doing? – DeltaIV Aug 24 '16 at 18:24
  • 1
    Reordering `setosa` is just so that the lines render correctly on the plot (otherwise they would zig-zag back and forth). As for `multcomp::glht`, unfortunately I don't feel confident that I can give you an accurate or useful explanation (beyond that it is a way to get simultaneous confidence bands), so you might be better served posting a follow-up question (either here or on CV). – Weihuang Wong Aug 24 '16 at 23:01
  • Thanks! I was also wondering why you didn't import the package `multcomp`...found out that it masks `dplyr`'s `select`. – DeltaIV Aug 25 '16 at 08:13