geom_ribbon with confidence intervals

Question

I would expect the following snippet to print the 95% confidence intervals of the length of the sepals:

ggplot(iris,aes(x=Species,y=Sepal.Length)) +
  stat_summary(geom='ribbon',
               fun=mean_cl_normal, 
               fun.args=list(conf.int=0.95))

Which additional diagnostics could I run to elucidate why the plot stays empty?

Edit: I was using the 'ribbon' geometry, because it would be important for me to indicate the confidence intervals as a shaded area.
For a categorical x variable, the 'ribbon' geometry doesn't make too much sense, as suggested in the helpful answers.
Indeed, my variable on the x axis is actually continuous and I had been a bit unfortunate in choosing the iris dataset as a minimal reproducible example.
It would therefore make more sense to choose a minimal example like the following:

  ggplot(data.frame(x=rep(1:3,each=3),y=c(1:3,4:6,7:9))) +
    stat_summary(aes(x=x,y=y),
                 geom='ribbon',
                 fun=mean_cl_normal, 
                 fun.args=list(conf.int=0.95))

perhaps you want `fun.data` instead of `fun`? – Z.Lin Apr 09 '23 at 12:32 — Z.Lin, Apr 09 '23 at 12:32

tjebo · Accepted Answer · 2023-04-10T07:50:39.017

3

What you're trying to visualise doesn't really make sense. You have a categorical variable x for which you have measurements y with a different variance for each value of x. What exactly is a ribbon between those x values supposed to signify?

Users Z.Lin and IRTFM have made a very valid point with using fun.data (+1)- and this is the correct way to show your data.

However, it is technically feasible to draw a ribbon, for which you then need to additionally specify group = 1, so that geom_ribbon draws between the categorical values. (Plot 1)

But I guess what you really want, is to draw the mean as a line and confidence intervals as a ribbon. For this, geom_ribbon will not be enough. You might use geom_smooth instead which draws a line and a ribbon, thus can deal with the three values which the mean_cl_normal function produces. (Plot 2)

library(tidyverse)
library(patchwork) ## loading just for demonstration 

## Plot 1 - using geom_ribbon
p1 <- ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  stat_summary(
    geom = "ribbon",
    fun.data = mean_cl_normal,
    fun.args = list(conf.int = 0.95), group = 1
  ) +
  ggtitle("Plot 1")

## with geom_smooth
p2 <-
  ggplot(iris, aes(x = Species, y = Sepal.Length)) +
  stat_summary(
    geom = "smooth",
    fun.data = mean_cl_normal,
    fun.args = list(conf.int = 0.95),
    group = 1,
    alpha = .5,
    color = "black",
    se = TRUE
  ) +
  ggtitle("Plot 2")

p1 + p2

^{Created on 2023-04-09 with reprex v2.0.2}

edited Apr 10 '23 at 07:50

answered Apr 09 '23 at 16:50

tjebo

21,977
7
58
94

Thanks, I have a numeric x variable and this snippet works very well. I'm wondering whether there is a way to change the `mean_cl_normal` function to other functions returning a mean, min and max value. – NicolasBourbaki Apr 09 '23 at 18:55
1

I am not going to accept that I am halfway to this goal. That is a horrible way to display categorical by continuous datset. The lines and shading that occupy the space between the categories does not represent anything that is arguably real. It's a distortion of the data. (Congratulations on the mind-reading but when someone asks for a method to distort the meaning of the data I think you should act more responsibly.) – IRTFM Apr 09 '23 at 21:24
@IRTFM I apologise for the phrasing - I will change this right away. I have learnt in my few years in Stackoverflow that sometimes there are reasons for people to ask for stuff that go beyond our understanding and might not necessarily be connected to something bad. What I can do as good as I can is to show them a solution to their problems. If their idea for their visualisation seems wrong, I usually try to comment on it - as I did here. And then they can do with this piece of information as they please. – tjebo Apr 10 '23 at 05:42
I agree that it doesn't make sense to use `geom_ribbon` and added a hint to my question. I would like to thank everyone for their help in finding a way towards an optimal graphical representation of the data. – NicolasBourbaki Apr 10 '23 at 18:15

IRTFM · Answer 2 · 2023-04-10T13:11:07.050

2

This is probably what you should have wanted:

ggplot(iris,aes(y=Sepal.Length, x=(Species) )) +
   stat_summary(
                fun.data=mean_cl_normal)

It follows the pattern of the first example on the help page:

edited Apr 10 '23 at 13:11

answered Apr 09 '23 at 12:38

IRTFM

258,963
21
364
487

Your solution works as described, but it would be important for me to indicate the confidence intervals as a shaded area. – NicolasBourbaki Apr 09 '23 at 14:51

score 1 · Answer 3 · answered Apr 09 '23 at 12:31

1

You could also use two stat_summary with first the mean point for each specie and after that use the mean_cl_normal confidence limits with errorbars like this:

library(ggplot2)
ggplot(iris,aes(x=Species,y=Sepal.Length)) +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun.data = mean_cl_normal,
               geom = "errorbar")

^{Created on 2023-04-09 with reprex v2.0.2}

answered Apr 09 '23 at 12:31

Quinten

35,235
5
20
53

The snippet works as you described it, but indicating the confidence interval as a shaded area would be important for me. – NicolasBourbaki Apr 09 '23 at 14:50

geom_ribbon with confidence intervals

3 Answers3