2

I'm trying to create some boxplots in R. I've been using both ggboxplot and ggplot. This is my code and output so far:

ggboxplot:

ggboxplot(shp_PA@data, x = "hei_1998to2007_cat", y = "adjrate.2008to2017", 
          xlab = "Hazardous Exposure Index Jenks", 
          ylab = "Lung Cancer Incidence Rate",
          color = "red",
          add = c("jitter", "mean"), 
          add.params = list(color = "black", shape=20)) 

ggboxplot boxplot

ggplot:

shp_PA@data %>%
  ggplot(aes(x=hei_1998to2007_cat, y=adjrate.2008to2017)) +
  geom_boxplot(colour = "red") + 
  geom_jitter(color="black", size=0.75) +
  stat_summary(fun=mean, geom="point", shape=4, size=3, color="black") +
  xlab("Hazardous Exposure Index Jenks") + 
  ylab("Lung Cancer Incidence Rate")

ggplot boxplot

My main interest right now is in putting a legend on each boxplot that has the symbol used to depict the mean, and the word "Mean" next to it. In base R, its as simple as putting something like

legend("topright", legend=c("Mean"),pch=5, col="red")

but I can't figure it out in ggboxplot or ggplot. Most of the things I've seen online discuss modifying a legend that is already present.

One other thing I'm wondering how to do is specific to ggboxplot. I want to be able to make the color and shape of the jitter points different from the symbol for the mean. I've tried changing the add.params code to

add.params = list(color = c("black", "blue"), shape=c(20,4))

but I get the error

Error: Aesthetics must be either length 1 or the same as the data (213): shape and colour

Any help is greatly appreciated!

Edit: Add reproducible example using iris dataset in R

ggboxplot:

ggboxplot(iris, x = "Species", y = "Sepal.Length", 
          color = "red",
          add = c("jitter", "mean"), 
          add.params = list(color = "black", shape=20)) 

ggplot:

ggplot(data=iris, aes(x=Species, y=Sepal.Length)) +
  geom_boxplot(colour = "red") + 
  geom_jitter(color="black", size=0.75) +
  stat_summary(fun=mean, geom="point", shape=4, size=3, color="black")

Again, I'd like to add a legend with the symbol used to depict the mean and the word "Mean", and be able to use ggboxplot to have the color and shape of the jitter and mean to be different.

L. Scott
  • 49
  • 1
  • 6

2 Answers2

3

Its a bit of a non-standard way to use ggplot, but you can do something like this.

add a legend with the symbol used to depict the mean and the word "Mean"

Map different shapes to geom_jitter and stat_summary using aes. Control those shapes using scale_shape_manual

have the color and shape of the jitter and mean to be different

Use color to change the colors for the jitter points and mean point, and use override.aes to change the colors in the legend.

ggplot(data=iris, aes(x=Species, y=Sepal.Length)) +
    geom_boxplot(colour = "red") + 
    geom_jitter(size=1, color = 'green', aes(shape = 'all data')) +
    stat_summary(fun=mean, geom="point", size=3, color = 'black', aes(shape = 'mean')) +
    scale_shape_manual(values = c(20, 4)) +
    guides(shape = guide_legend(override.aes = list(color = c('green', 'black'))))

enter image description here

Another similar answer here: https://stackoverflow.com/a/5179731/12400385

nniloc
  • 4,128
  • 2
  • 11
  • 22
0

Welcome to SO!

Adding custom labels to ggplot2 is notoriously difficult, and I believe this is by design. All legends are controlled by the arguments placed in aes and scale_*_[continuoues|discrete|manual]. If we don't want to start learning how to grob (likely spending several hours) we can however achieve the desired output by

  1. Adding are statistic to the data itself
  2. Create a column indicating which is the statistic and which is data points
  3. Abuse that we can subset the data directly in our geom_* function to create a specific layer for jitter and non-jittered points, and set the shape in the aestethics of these layers
  4. Customize the marks using scale_shape_manual (or scale_shape_discrete).

Using the mtcars dataset as an example (and dplyr for piping) we can obtain something very similar to ggboxplot

library(ggplot2)
library(dplyr)
data(mtcars)
# Setup data with mean instead of using stat_summary
mtcars %>% 
  select(cyl, hp) %>%
  group_by(cyl) %>%
  summarize(hp = mean(hp)) %>% 
  bind_cols(stat = factor(rep('mean', 3))) %>% 
  bind_rows(mtcars %>%
              select(cyl, hp) %>% 
              bind_cols(stat = rep('data', nrow(mtcars)))) %>%

  # Create ggplot
  ggplot(aes(x = factor(cyl), y = hp)) + 
  geom_boxplot(colour = 'red') + 

  # Jitter based on subset of data. Do the same for geom_point (means)
  ## Note that to only plot a subset I pass a function to data that "filters" the data.
  geom_jitter(data = function(.data)filter(.data, stat == 'data'), 
              aes(shape = stat), color = 'black') +
  # Add mean to the point and change shape into something we like.
  geom_point(data = function(.data)filter(.data, stat == 'mean'), 
             aes(shape = stat), size = 2.5) +
  ## Use scale_shape_manual to change shape into something i like.
  scale_shape_manual(values = c('mean' = 8, 'data' = 16)) +

  # Fix the plot theme to be similar to ggboxplot
  theme(panel.grid = element_line(colour = NA),
        panel.background = element_rect(fill = "#00000000"),
        axis.line.x = element_line(colour = 'black'), 
        axis.line.y = element_line(colour = 'black'), 
        axis.text = element_text(size = 11),
        legend.position = 'bottom'
        ) +
  # Remove label from the legend if wanted
  labs(shape = NULL)

ggboxplot-clone

Oliver
  • 8,169
  • 3
  • 15
  • 37