4

I would like to add the total number of observations per group on a density plot. I would like to know if stat_summary can be used for this. I have tried to find an example for this case and I can't find it. There are only examples for box plots. For example, I have followed this example: Use stat_summary to annotate plot with number of observations

adapting the code to my case, which is plotting a density graph.

n_fun <- function(x){
         return(data.frame(y = median(x), label = paste0("n = ",length(x))))
         }

ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
stat_summary(fun.data = n_fun, geom = "text")

and the error that I get is :

Error: stat_summary requires the following missing aesthetics: y

Only plotting the density plot works fine. The error appears when adding stat_summary

Help will be greatly appreciated.

Community
  • 1
  • 1
ruthy_gg
  • 337
  • 3
  • 11

2 Answers2

4

I think @jlhoward 's answer is exactly what you wanted. In case you need to plot many densities in the same graph I'd suggest to include the additional info you want (number of observations) in the legend and not in the plot. Like this:

library(ggplot2)

df        <- mtcars
df$median <- ave(df$mpg, df$cyl, FUN=median)
df$label  <- ave(df$mpg, df$cyl, FUN=function(x)paste0("n = ",length(x)))
df$cyl_group <- paste0(df$cyl, "  (", df$label, ")")

ggplot(df, aes(x=mpg, colour=cyl_group)) +
  geom_line(stat="density", aes(linetype=cyl_group), size=0.8) 

enter image description here

AntoniosK
  • 15,991
  • 2
  • 19
  • 32
3

The short answer is no, you can't use stat_summary(...) for this (although now that I've said it, I'm sure someone will come along and show you how to do it that way).

stat_summary(...) requires an x and y aesthetic. Generally there are more than 1 y for a given x, and stat_summary(...) uses fun.data to summarize y for each x, and then plots the result for each x.

So first, you never specified the y aesthetic. Second, since x=mpg there is only one y for each x. In the post you cite, x=factor(cyl) and y=mpg, which is why it works there and not here.

Third, it's not clear what you are trying to accomplish, as you seem to want the labels located at y=median(mpg). But since the density plot produces densities, the labels will all be off-scale:

ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
  geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
  stat_summary(aes(y=mpg),fun.data = n_fun, geom = "text")

Note there is one label for each x=mpg and since there is only one y for each x, median(x) = x and label="n = 1" in (almost) all cases. Not very useful.

Here is a way to do more or less what you seem to want:

df.lbl       <- aggregate(mpg~cyl,mtcars, median)
df.lbl$label <- aggregate(mpg~cyl,mtcars, function(x) paste0("n = ",length(x)))[,2]
ggplot(mtcars, aes(x=mpg, colour=factor(cyl))) +
  geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) +
  geom_text(data=df.lbl, aes(label=label, y=0.05), show_guide=FALSE)

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • I think your `geom_text` command is using `df` so it's plotting labels at the same point multiple times. What about using a `df2` table with unique values? `df <- mtcars; df$median <- ave(df$mpg, df$cyl, FUN=median); df$label <- ave(df$mpg, df$cyl, FUN=function(x)paste0("n = ",length(x))); df2 = unique(df[,c("cyl","median","label")]); ggplot(df, aes(x=mpg, colour=factor(cyl))) + geom_line(stat="density", aes(linetype=factor(cyl)), size=0.8) + geom_text(data=df2, aes(x=median,label=label, y=0.05), show_guide=FALSE)`. Might make things faster when you have a much bigger `df`. – AntoniosK Oct 01 '15 at 22:03
  • You're right that this plots the labels multiple times. Changed the answer to fix that. – jlhoward Oct 02 '15 at 02:18