I'd like to create a ggplot
in which all factor levels are displayed in the plot's legend even if not all of them are in the data. Let me show what I mean.
Here's my data:
set.seed(1)
df <- data.frame(y.tick.lab = letters[1:10],p.value=c(runif(6,0,0.25),runif(4,0.25,1)),
group=c(rep("enriched",3), rep("depleted",3), rep("enriched",2), rep("depleted",2)),
bar.col=rep("#C8C8C8",10), p.value.cat=rep("I",10),
stringsAsFactors = FALSE)
df$log.p.value = -10*log10(df$p.value)
Now I update bar.col
and p.value.cat
according to p.value
:
p.val2col.df = data.frame(col = c("#2f3b61","#0000B2","#436CE8","#9494FF","#E0E0FF","#7d4343","#B20000","#C74747","#E09898","#EBCCD6"),
p.value = rep(c(0.05,0.1,0.15,0.2,0.25),2),
cat = c("D(0-0.05]","D(0.05-0.1]","D(0.1-0.15]","D(0.15-0.2]","D(0.2-0.25]","E(0-0.05]","E(0.05-0.1]","E(0.1-0.15]","E(0.15-0.2]","E(0.2-0.25]"),
group = c(rep("depleted",5),rep("enriched",5)), stringsAsFactors = F)
idx = which(df$p.value < 0.25)
df$bar.col[idx] = sapply(1:length(idx), function(x) {
p.val2col.df$col[min(which(p.val2col.df$group == df$group[idx[x]] & p.val2col.df$p.value > df$p.value[idx[x]]))]
})
df$p.value.cat[idx] = sapply(1:length(idx), function(x) {
p.val2col.df$cat[min(which(p.val2col.df$group == df$group[idx[x]] & p.val2col.df$p.value > df$p.value[idx[x]]))]
})
Here's where I set the levels of bar.col
to include everything that's in p.val2col.df
:
color.order = c(p.val2col.df$col[1:5],"#C8C8C8", p.val2col.df$col[10:6])
color.labels = c(p.val2col.df$cat[1:5],"I(0.25-1]", p.val2col.df$cat[10:6])
names(color.order) = color.labels
df$bar.col = factor(df$bar.col, levels=color.order, labels=color.labels)
And here's is my ggplot
code:
pl = ggplot(df, aes(y=log.p.value, x=y.tick.lab,fill=bar.col)) +
scale_fill_manual(values=color.order, name="E/D(P-value range)") +
geom_bar(stat="identity", width=0.2) +
scale_y_continuous(limits=c(0,30), labels = c(seq(0,20,10)," >30"),expand=c(0,0)) +
theme(axis.text=element_text(size=8), axis.title=element_text(size=8,face="bold")) +
coord_flip() +
theme(plot.margin=unit(c(0.1,1,0.1,0.1),"cm"), axis.title.y = element_text(size=8), axis.title.x = element_text(size=8)) +
labs(y="-10log10(P-value)", x="Group")
But the resulting figure legend only includes those bar.col
's that are in df
rather than in levels(df$bar.col).
So my question is how do I get all levels(df$bar.col)
to the legend?