10

How would I add a text annotation (eg. sd = sd_value) of the standard deviation in each panel of the following plot using ggplot2 in R?

library(datasets)
data(mtcars)
ggplot(data = mtcars, aes(x = hp)) + 
        geom_dotplot(binwidth = 1) + 
        geom_density() + 
        facet_grid(. ~ cyl) + 
        theme_bw()

I'd post an image of the plot, but I don't have enough rep.

I think "geom_text" or "annotate" might be useful but I'm not sure quite sure how.

adatum
  • 655
  • 9
  • 23
  • possible duplicate of [Annotate ggplot2 facets with number of observations per facet](http://stackoverflow.com/questions/13239843/annotate-ggplot2-facets-with-number-of-observations-per-facet) – user20650 May 28 '15 at 02:01

2 Answers2

4

If you want to vary the text label in each facet, you will want to use geom_text. If you want the same text to appear in each facet, you can use annotate.

p <- ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl)

mylabels <- data.frame(
  cyl = c(4, 6, 8), 
  label = c("first label", "second label different", "and another")
)

p + geom_text(x = 200, y = 0.75, aes(label = label), data = mylabels)

### compare that to this way with annotate

p + annotate("text", x = 200, y = 0.75, label = "same label everywhere")

Now, if you really want standard deviation by cyl in this example, I'd probably use dplyr to do the calculation first and then complete this with geom_text like so:

library(ggplot2)
library(dplyr)
    
df.sd.hp <- mtcars %>%
  group_by(cyl) %>%
  summarise(hp.sd = round(sd(hp), 2))
    
ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) +
  geom_text(
    data = df.sd.hp, 
    aes(label = paste0("SD: ", hp.sd))
    x = 200, y = 0.75
  ) 
JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116
  • Thanks. The last example with geom_text is exactly what I was looking for. I'm still getting used to ggplot2; is there a way to do the same thing with the stat_ family of functions in ggplot2 for common statistical calculations without having to use dplyr first? – adatum May 28 '15 at 04:51
  • Also, how could I include greek letters (eg. sigma) and/or latex (eg. \sigma^2) in the text part of the label? – adatum May 28 '15 at 06:05
  • You can use expression() for math notation. – Axeman May 28 '15 at 09:19
  • Using parse=T in geom_text solved it for me. So for the above example the following works: geom_text(x = 200, y = 0.75, aes(label = paste0("sigma ==", hp.sd)), data = df.sd.hp, parse=T) – adatum May 28 '15 at 17:44
  • Now, how could multiple lines be included in the label to include, say, mean and standard deviation? – adatum May 28 '15 at 18:07
  • @adatum For multiple lines you could follow the `df.sd.hp` with multiple variables such as `...summarise(hp.sd = round(sd(hp), 2), hp.mean = round(mean(hp), 2))` and then do something like `paste0("SD: ", hp.sd, "\nMean: ", hp.mean, ...` – JasonAizkalns May 28 '15 at 18:13
  • The `\n` doesn't seem to work here. For an easy example without extra calculation: `paste0("SD: ", hp.sd, "\nVar: ", hp.sd^2)` doesn't print anything after `hp.sd` if parse=T. It works if parse=F. – adatum May 28 '15 at 18:21
  • 1
    Please remove the space in `my labels` – Julien Apr 19 '23 at 15:46
1

I prefer the appearance of the graph when the statistic appears within the facet label itself. I made the following script, which allows the choice of displaying the standard deviation, mean or count. Essentially it calculates the summary statistic then merges this with the name so that you have the format CATEGORY (SUMMARY STAT = VALUE).

   #' Function will update the name with the statistic of your choice
AddNameStat <- function(df, category, count_col, stat = c("sd","mean","count"), dp= 0){

  # Create temporary data frame for analysis
  temp <- data.frame(ref = df[[category]], comp = df[[count_col]])

  # Aggregate the variables and calculate statistics
  agg_stats <- plyr::ddply(temp, .(ref), summarize,
                           sd = sd(comp),
                           mean = mean(comp),
                           count = length(comp))

  # Dictionary used to replace stat name with correct symbol for plot
  labelName <- mapvalues(stat, from=c("sd","mean","count"), to=c("\u03C3", "x", "n"))

  # Updates the name based on the selected variable
  agg_stats$join <- paste0(agg_stats$ref, " \n (", labelName," = ",
                           round(agg_stats[[stat]], dp), ")")

  # Map the names
  name_map <- setNames(agg_stats$join, as.factor(agg_stats$ref))
  return(name_map[as.character(df[[category]])])
}

Using this script with your original question:

library(datasets)
data(mtcars)

# Update the variable name
mtcars$cyl  <- AddNameStat(mtcars, "cyl", "hp", stat = "sd")

ggplot(data = mtcars, aes(x = hp)) + 
  geom_dotplot(binwidth = 1) + 
  geom_density() + 
  facet_grid(. ~ cyl) + 
  theme_bw()

enter image description here

The script should be easy to alter to include other summary statistics. I am also sure it could be rewritten in parts to make it a bit cleaner!

Michael Harper
  • 14,721
  • 2
  • 60
  • 84
  • Nice formatting; I like it. Improvements could include adding units, and allowing multiple statistics. – adatum Oct 27 '17 at 21:20