Meaning of band width in ggplot geom_smooth lm

Question

With the following code:

library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) +
    geom_point(aes(colour=factor(cyl))) +
    geom_smooth(method="lm")

I can get this plot:

enter image description here

My question is how does the grey zone defined? What's the meaning of it. And how can I play around with various parameter that control the width of that band?

score 42 · Accepted Answer · answered Apr 10 '15 at 06:44

42

By default, it is the 95% confidence level interval for predictions from a linear model ("lm"). The documentation from ?geom_smooth states that:

The default stat for this geom is stat_smooth see that documentation for more options to control the underlying statistical transformation.

Digging one level deeper, doc from ?stat_smooth tells us about the methods used to calculate the smoother's area.

For quick results, one can play with one of the arguments for stat_smooth which is level : level of confidence interval to use (0.95 by default)

By passing that parameter to geom_smooth, it is passed in turn to stat_smooth, so that if you wish to have a narrower region, you could use for instance .90 as a confidence level:

ggplot(mtcars, aes(x=wt, y=mpg)) +
    geom_point(aes(colour=factor(cyl))) +
    geom_smooth(method="lm", level=0.90)

enter image description here

answered Apr 10 '15 at 06:44

Dominic Comtois

10,230
1
39
61

Thanks. What does confidence interval (CI) tells us here? How did you choose which is the 'ideal' level for CI? – neversaint Apr 11 '15 at 02:27
14

There's no "ideal" level, only more or less conservative (prudent) ones. For what it tells us, I'd suggest looking into `?predict` and `?predict.lm`. Basically it indicates the "range" in which our predictions would be if we were to repeat the experiment (sampling) over and over. One sampling leads to a single straight line of predictions; taking into account variability of the data, the zones indicate a range of possible straight lines, if you will. By setting level at .9, we say "if we were to repeat the sampling over and over, 90% of the regression lines would be inside that grey zone". – Dominic Comtois Apr 11 '15 at 02:51
4

Is ti possible to show something other than se? For example, the 10th and 90th quantiles of the data? – Simon Woodward Aug 22 '17 at 02:55
Why is it narrower the lower the chosen level is? – Ben Dec 10 '18 at 09:19
@Ben, it is narrower the lower the confidence interval, because the more the more one restricts the band the higher the chance that it was a fluke, and that the real regression curve falls outside. – gciriani Nov 22 '19 at 19:53
@Ben It's always a trade-off between precision and certitude (or confidence). If you want to be 99% confident of capturing the populational value (parameter), then your estimation needs to accommodate for quite a bit of departure from the estimate obtained with your current sample. Using a low confidence level = getting more precision at the cost of a high risk of missing the target. – Dominic Comtois Nov 16 '21 at 00:42
1

@SimonWoodward Maybe look into [quantile regression](https://ggplot2.tidyverse.org/reference/geom_quantile.html) – Dominic Comtois Nov 16 '21 at 00:45

score 9 · Answer 2 · answered Apr 10 '15 at 06:41

9

It's the confidence interval. You can use se=FALSE if you do not want to display it. You can also use level = 0.99 if you want to have a 99% CI instead of a 95% CI. See ?stat_smooth for all the details.

answered Apr 10 '15 at 06:41

shadow

21,823
4
63
77

Meaning of band width in ggplot geom_smooth lm

2 Answers2

Linked

Related