3

I'd like to create a ggplot in which all factor levels are displayed in the plot's legend even if not all of them are in the data. Let me show what I mean.

Here's my data:

set.seed(1)
df <- data.frame(y.tick.lab = letters[1:10],p.value=c(runif(6,0,0.25),runif(4,0.25,1)),
                 group=c(rep("enriched",3), rep("depleted",3), rep("enriched",2), rep("depleted",2)),
                 bar.col=rep("#C8C8C8",10), p.value.cat=rep("I",10),
                 stringsAsFactors = FALSE)
df$log.p.value = -10*log10(df$p.value)

Now I update bar.col and p.value.cat according to p.value:

p.val2col.df = data.frame(col = c("#2f3b61","#0000B2","#436CE8","#9494FF","#E0E0FF","#7d4343","#B20000","#C74747","#E09898","#EBCCD6"),
                          p.value = rep(c(0.05,0.1,0.15,0.2,0.25),2),
                          cat = c("D(0-0.05]","D(0.05-0.1]","D(0.1-0.15]","D(0.15-0.2]","D(0.2-0.25]","E(0-0.05]","E(0.05-0.1]","E(0.1-0.15]","E(0.15-0.2]","E(0.2-0.25]"),
                          group = c(rep("depleted",5),rep("enriched",5)), stringsAsFactors = F)
idx = which(df$p.value < 0.25)
df$bar.col[idx] = sapply(1:length(idx), function(x) {
  p.val2col.df$col[min(which(p.val2col.df$group == df$group[idx[x]] & p.val2col.df$p.value > df$p.value[idx[x]]))]
})
df$p.value.cat[idx] = sapply(1:length(idx), function(x) {
  p.val2col.df$cat[min(which(p.val2col.df$group == df$group[idx[x]] & p.val2col.df$p.value > df$p.value[idx[x]]))]
})

Here's where I set the levels of bar.col to include everything that's in p.val2col.df:

color.order = c(p.val2col.df$col[1:5],"#C8C8C8", p.val2col.df$col[10:6])
color.labels = c(p.val2col.df$cat[1:5],"I(0.25-1]", p.val2col.df$cat[10:6])
names(color.order) = color.labels
df$bar.col = factor(df$bar.col, levels=color.order, labels=color.labels)

And here's is my ggplot code:

 pl = ggplot(df, aes(y=log.p.value, x=y.tick.lab,fill=bar.col)) +
 scale_fill_manual(values=color.order, name="E/D(P-value range)") +
 geom_bar(stat="identity", width=0.2) + 
 scale_y_continuous(limits=c(0,30), labels = c(seq(0,20,10),"   >30"),expand=c(0,0)) +
 theme(axis.text=element_text(size=8), axis.title=element_text(size=8,face="bold")) +  
 coord_flip() + 
 theme(plot.margin=unit(c(0.1,1,0.1,0.1),"cm"), axis.title.y = element_text(size=8), axis.title.x = element_text(size=8)) +  
 labs(y="-10log10(P-value)", x="Group")

But the resulting figure legend only includes those bar.col's that are in df rather than in levels(df$bar.col).

enter image description here

So my question is how do I get all levels(df$bar.col) to the legend?

Dave2e
  • 22,192
  • 18
  • 42
  • 50
dan
  • 6,048
  • 10
  • 57
  • 125
  • 5
    You need `drop = FALSE` in `scale_fill_manual`. See [here](http://stackoverflow.com/questions/10002627/ggplot2-0-9-0-automatically-dropping-unused-factor-levels-from-plot-legend) and [here](http://stackoverflow.com/questions/10834382/ggplot2-keep-unused-levels-barplot) for examples – aosmith Jun 08 '16 at 18:15
  • Does anyone know how to show all factors in ggpubr, which is based on ggplot? Thank you! – James Jun 24 '20 at 00:52

0 Answers0