ggplot2: How to split legend by color into a customized legend

Question

I am struggling quite a bit with making the labels of a plot look a certain way. I am using ggplot2 and tidyverse.

This is what I have:

I would like to have two headlines (=name) for the legend, one for the cell type HCT, and one for the cell type RKO. Then for HCT and RKO each, I want to have the legend for the Reagent with the respective color, linetype and shape. So basically, I want to break up the color legend into two separate legends. I just can't wrap my head around how to code it. Here is a drawing of what I would like to have instead (for the figure legend; please imagine the orange square is filled in):

Do I need to change my geom_line and geom_point code in order to achieve the legend style I'd like? Or is there another way to do it? I tried searching for a way to do it but couldn't find anything (maybe I am just not using the correct terms). I already tried following what was done here: How to merge color, line style and shape legends in ggplot and Combine legends for color and shape into a single legend but I couldn't get it to work. (In other words, I tried changing scale_shape_manual etc. to accommodate my wishes with no success. I also attempted to use interaction())

Note: I decided not to use facet_wrap since I want to show both of the cell types on the same plot. The plot of the real data looks a little different and it's not as overwhelming. I was able to successfully plot a "facet_wrap" plot with ggpubr.

Note2: I also did not use stat_summary() because I need to take the mean of the same reagent concentration, reagent and cell type. With my data, I did not find a way to make stat_summary work.

Here is the code that I currently have:

mean_mutated <- mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>%
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_0 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="0") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_1 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="1") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))
mutated_2 = mutated %>% group_by(Reagent, Reagent.Conc, Cell.type) %>% filter(Reagent=="2") %>% 
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE))

#linetype by reagent
ggplot() +  
  #the scatter plot per cell type -> that way I can color them the way I want to, I believe
  #the mean/average line plot 
  geom_point(mean_mutated, mapping= aes(x = as.factor(Reagent.Conc), y = Avg.Viable.Cells, shape=as.factor(Reagent), color=Cell.type)) +
  geom_line(mutated_1, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "1"))+
  geom_line(mutated_2, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "2"))+
  geom_line(mutated_0, mapping= aes(x = as.factor(Reagent.Conc),y = Avg.Viable.Cells, group=Cell.type, color=Cell.type, linetype = "0"))+
  
  #making the plot look prettier
  scale_colour_manual(values = c("#999999", "#E69F00")) +
  #scale_linetype_manual(values = c("solid", "dashed", "dotted")) + #for whatever reason, when I add this, the dash in the legend is removed...?
  labs(shape = "Reagent", linetype = "Reagent", color="Cell type")+
  scale_shape_manual(values=c(15,16,4), labels=c("0", "1", "2"))+
  #guides(shape = FALSE)+ #this removes the label that you don't want
  
  #Change the look of the plot and change the axes
  xlab("[Reagent] (nM/ml)")+ #change name of x-axis
  ylab("Relative viability")+ #change name of y-axis
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10))+ #adjust the y-axis so that it has more ticks
  expand_limits(y = 0)+
  theme_bw() + #this and the next line are to remove the background grid and make it look more publication-like
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))

And a snapshot of my data frame "mutated" produced by dput(df[9:32, c(1,2,3,4,5)]):

    structure(list(Biological.Replicate = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), Reagent.Conc = c(10000, 2500, 625, 156.3, 39.1, 9.8, 
2.4, 0.6, 10000, 2500, 625, 156.3, 39.1, 9.8, 2.4, 0.6, 10000, 
2500, 625, 156.3, 39.1, 9.8, 2.4, 0.6), Reagent = c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L), Cell.type = c("HCT", "HCT", "HCT", "HCT", 
"HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", "HCT", 
"HCT", "HCT", "HCT", "RKO", "RKO", "RKO", "RKO", "RKO", "RKO", 
"RKO", "RKO"), Mean.Viable.Cells.1 = c(1.014923966, 1.022279854, 
1.00926559, 0.936979842, 0.935565248, 0.966403395, 1.00007073, 
0.978144524, 1.019673384, 0.991595836, 0.977270557, 1.007353643, 
1.111928183, 0.963518289, 0.993028364, 1.027409034, 1.055452733, 
0.953801253, 0.956577449, 0.792568337, 0.797052961, 0.755623576, 
0.838482346, 0.836773918)), row.names = 9:32, class = "data.frame")

Note3: Even though one column name is "Mean.Viable.Cells.1", this is not the mean I am plotting, but rather the mean of a technical replicate, calculated previously. I am taking the mean of the biological replicates in mutated_0, mutated_1 and mutated_2 to plot it.

i would request dput(head(df)) your dataframe as that would be helpful for users to recreate the dataframe and your code can be used..intead of putting pic of your dataframe — PesKchan, Mar 25 '21 at 17:28
Thank you for the suggestion @PesKchan! I added a snippet of the data and made sure it had all three of the reagents and both of the cell types. — Fio, Mar 25 '21 at 17:46

stefan · Accepted Answer · 2021-03-26T07:02:32.537

Making use of the ggnewscale package this could be achieved like so:

Convert Cell.Type and Reagent to factors before manipulating the dataset
There is no need for the datasets mutate_0, ... You only need one summary dataset, which I split by Cell.type to simplify the code later on.
To get your desired result plot your data separately for each cell type. That's why I splitted the data by cell type
To get separated legend make use of ggnewscale::new_scale to add a second scale and legend for linetype and shape. Moreover, remove color from the aesthetics and set it as an argument
At least for the snippet of your data your have to add drop=FALSE to both scales to keep unused factor levels.
Finally, to reduce code duplication I make use of a helper function to add the geoms and scales for each Cell.type.

library(ggplot2)
library(dplyr)

mutated <- mutated %>% 
  mutate(Cell.type = factor(Cell.type, levels = c("HCT", "RKO")),
         Reagent = factor(Reagent, levels = c("0", "1", "2"))
  )

mean_mutated <- mutated %>%
  group_by(Reagent, Reagent.Conc, Cell.type) %>%
  summarise(Avg.Viable.Cells = mean(Mean.Viable.Cells.1, na.rm = TRUE)) %>% 
  split(.$Cell.type)
#> `summarise()` has grouped output by 'Reagent', 'Reagent.Conc'. You can override using the `.groups` argument.

layer_geom_scale <- function(cell_type, color) {
  list(
    geom_point(mean_mutated[[cell_type]], mapping = aes(shape = Reagent), color = color),
    geom_line(mean_mutated[[cell_type]], mapping = aes(group = Reagent, linetype = Reagent), color = color),
    scale_linetype_manual(name = cell_type, values = c("solid", "dashed", "dotted"), drop=FALSE),
    scale_shape_manual(name = cell_type, values = c(15, 16, 4), labels = c("0", "1", "2"), drop=FALSE) 
  )
}

# linetype by reagent
ggplot(mapping = aes(
  x = as.factor(Reagent.Conc),
  y = Avg.Viable.Cells
)) +
  layer_geom_scale("HCT", "#999999") +
  ggnewscale::new_scale("linetype") +
  ggnewscale::new_scale("shape") +
  layer_geom_scale("RKO", "#E69F00") +
  scale_y_continuous(breaks = scales::pretty_breaks(n = 10), limits = c(0, NA)) +
  theme_bw() +
  theme(
    panel.border = element_blank(), panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(), axis.line = element_line(colour = "black")
  ) +
  labs(shape = "Reagent", 
       linetype = "Reagent", 
       color = "Cell type",
       x = "[Reagent] (nM/ml)",
       y = "Relative viability")

Hi Stefan, thank you very much for your detailed explanation and answer. I hope it's ok if I ask you some questions about the code you wrote, as I am trying to understand how everything works. The layer_geom_scale function is brilliant - so each time it's called, it plots the scatter- and lineplots for the respective cell type, correct? I did not know about the ggnewscale package, so this is very useful. I read the documentation on it and wanted to verify: the position of new_scale between the "layers" is important, i.e. if I had a third layer, I'd have to call 2x new_scale, again? — Fio, Mar 26 '21 at 15:11
Also, thank you for cleaning up my code. I did not realize that I called my x&y aes every time, so I could’ve put it in the ggplot() call, for example. — Fio, Mar 26 '21 at 15:12
Hej @FioDa. You are welcome. And yes. You are absolutely right about my helper function and ggnewscale. If you want to plot more cell types you have to call 2 x new_scale for each additional category and add the layers one by one. For two or three different categories this is fine. For more categories ... you could loop over the categories via e.g. purrr::reduce instead of adding them via copy & paste. Best S. — stefan, Mar 26 '21 at 18:27

ggplot2: How to split legend by color into a customized legend

1 Answers1

Linked