16

I have this dataframe:

set.seed(1)
x <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
y <- c(rep("site1", 50), rep("site2", 50))
xy <- data.frame(x, y)

And I have made this density plot:

library(ggplot2)
ggplot(xy, aes(x, color = y)) + geom_density()

enter image description here

For site1 I need to shade the area under the curve that > 1% of the data. For site2 I need to shade the area under the curve that < 75% of the data.

I'm expecting the plot to look something like this (photoshopped). Having been through stack overflow, I'm aware that others have asked how to shade part of the area under a curve, but I cannot figure out how to shade the area under a curve by group.

enter image description here

luciano
  • 13,158
  • 36
  • 90
  • 130
  • possible duplicate of [Shading a kernel density plot between two points.](http://stackoverflow.com/questions/3494593/shading-a-kernel-density-plot-between-two-points) – joran Dec 03 '13 at 16:24
  • I've looked at that question, but I can't figure out how to shade different areas by group – luciano Dec 03 '13 at 16:42

3 Answers3

12

Here is one way (and, as @joran says, this is an extension of the response here):

#  same data, just renaming columns for clarity later on
#  also, use data tables
library(data.table)
set.seed(1)
value <- c(rnorm(50, mean = 1), rnorm(50, mean = 3))
site  <- c(rep("site1", 50), rep("site2", 50))
dt    <- data.table(site,value)
#  generate kdf
gg <- dt[,list(x=density(value)$x, y=density(value)$y),by="site"]
#  calculate quantiles
q1 <- quantile(dt[site=="site1",value],0.01)
q2 <- quantile(dt[site=="site2",value],0.75)
# generate the plot
ggplot(dt) + stat_density(aes(x=value,color=site),geom="line",position="dodge")+
  geom_ribbon(data=subset(gg,site=="site1" & x>q1),
              aes(x=x,ymax=y),ymin=0,fill="red", alpha=0.5)+
  geom_ribbon(data=subset(gg,site=="site2" & x<q2),
              aes(x=x,ymax=y),ymin=0,fill="blue", alpha=0.5)

Produces this:

Community
  • 1
  • 1
jlhoward
  • 58,004
  • 7
  • 97
  • 140
3

The problem with @jlhoward's solution is that you need to manually add goem_ribbon for each group you have. I wrote my own ggplot stat wrapper following this vignette. The benefit of this is that it automatically works with group_by and facet and you don't need to manually add geoms for each group.

StatAreaUnderDensity <- ggproto(
  "StatAreaUnderDensity", Stat,
  required_aes = "x",
  compute_group = function(data, scales, xlim = NULL, n = 50) {
    fun <- approxfun(density(data$x))
    StatFunction$compute_group(data, scales, fun = fun, xlim = xlim, n = n)
  }
)

stat_aud <- function(mapping = NULL, data = NULL, geom = "area",
                    position = "identity", na.rm = FALSE, show.legend = NA, 
                    inherit.aes = TRUE, n = 50, xlim=NULL,  
                    ...) {
  layer(
    stat = StatAreaUnderDensity, data = data, mapping = mapping, geom = geom, 
    position = position, show.legend = show.legend, inherit.aes = inherit.aes,
    params = list(xlim = xlim, n = n, ...))
}

Now you can use stat_aud function just like other ggplot geoms.

set.seed(1)
x <- c(rnorm(500, mean = 1), rnorm(500, mean = 3))
y <- c(rep("group 1", 500), rep("group 2", 500))
t_critical = 1.5

tibble(x=x, y=y)%>%ggplot(aes(x=x,color=y))+
  geom_density()+
  geom_vline(xintercept = t_critical)+
  stat_aud(geom="area",
           aes(fill=y),
           xlim = c(0, t_critical), 
              alpha = .2)

enter image description here

tibble(x=x, y=y)%>%ggplot(aes(x=x))+
  geom_density()+
  geom_vline(xintercept = t_critical)+
  stat_aud(geom="area",
           fill = "orange",
           xlim = c(0, t_critical), 
              alpha = .2)+
  facet_grid(~y)

enter image description here

Lala La
  • 1,352
  • 9
  • 18
  • 1
    How could I extend this to use different `x_lim` per group based on a variable in my `data.frame`? – Simon Feb 02 '21 at 11:22
  • You can read [here](https://ggplot2.tidyverse.org/reference/ggplot2-ggproto.html) and try to extend the custom layer to accept aesthetics. – Lala La Feb 03 '21 at 05:48
-1

You need to use fill. color controls the outline of the density plot, which is necessary if you want non-black outlines.

ggplot(xy, aes(x, color=y, fill = y, alpha=0.4)) + geom_density()

To get something like that. Then you can remove the alpha part of the legend by using

ggplot(xy, aes(x, color = y, fill = y, alpha=0.4)) + geom_density()+ guides(alpha='none')
Max Candocia
  • 4,294
  • 35
  • 58
  • Sorry that doesn't get me the plot I need. Note that different areas need to be filled for each group, not the full area under the curve – luciano Dec 03 '13 at 16:44