0

I am trying to write a script that generates stacked barplots showing the percentages of glycoforms on an analysed protein. I am using the "turbo" color scale from Viridis because I need a very wide coverage of colors to be able to distinguish all the glycoforms.

Edit: Here is an example datafile, used to generate 1 single bar:

structure(list(Glycoform = c("NaNaF", "NaAF", "NaGnF", "NaMF", 
"NaNa", "NaA", "NaGn", "NaM", "AAF", "AGnF", "AMF", "AA", "AGn", 
"AM/Man4Gn", "Man4A/Man5Gn", "Man5/A", "GnGnXF", "MGnXF", "MMXF", 
"GnGnF", "MGnF", "MMF", "GnGnX", "MGnX", "MMX", "GnGn", "MGn", 
"MM", "Man9", "Man8", "Man7", "Man6", "Man5", "Man4", "Rest"), 
    Raw = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 81.82, 
    5.35, 0, 0, 0, 0, 10.69, 0, 0, 0, 0, 0, 2.14, 0, 0, 0, 0, 
    0, 2.14), Percentage = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 81.82, 5.35, 0, 0, 0, 0, 10.69, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 2.14), Plant = c("A", "A", "A", "A", 
    "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
    "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", 
    "A", "A", "A", "A", "A", "A", "A")), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -35L))

Because it is imperative that in every plot, a respective glycoform has to have the same color, I order my glycoforms like so:

t$Glycoform<-factor(t$Glycoform,levels=c("NaNaF","NaAF","NaGnF","NaMF","NaNa","NaA","NaGn","NaM","AAF","AGnF","AMF","AA","AGn","AM/Man4Gn","Man4A/Man5Gn","Man5/A","GnGnXF","MGnXF","MMXF","GnGnF","MGnF","MMF","GnGnX","MGnX","MMX","GnGn","MGn","MM","Man9","Man8","Man7","Man6","Man5","Man4","Rest"))

But because in my raw data, non-found glycoforms have a %-value of 0, they still show up in the legend:

library(ggplot2)
library(viridis)
ggplot(t,aes(fill=Glycoform,y=Percentage,x=Plant,group=Glycoform))+
  geom_bar(stat="identity",color="white",size=1.2, width=0.8)+
  scale_fill_viridis(option="turbo",discrete="TRUE")+
  geom_text(data=subset(t,Percentage>0),position=position_stack(vjust=0.5),color="white",size=4,aes(label=Glycoform),fontface="bold")+
  guides(fill=guide_legend(ncol=2))

barplot with wrong legend

When I convert the unused values to NA, the legend is fixed, but the color-assignment is wrong:

t[t == 0] <- NA
ggplot(t,aes(fill=Glycoform,y=Percentage,x=Plant,group=Glycoform))+
  geom_bar(stat="identity",color="white",size=1.2, width=0.8)+
  scale_fill_viridis(option="turbo",discrete="TRUE")+
  geom_text(data=t,position=position_stack(vjust=0.5),color="white",size=4,aes(label=Glycoform),fontface="bold")+
  guides(fill=guide_legend(ncol=2))

barplot with wrong colors

I was unable to either lock colors onto factors in Viridis if unused values are set to NA and also unable to hide legend elements of values of zero in the case where I leave them as zeros. I am a R-beginner.

halfer
  • 19,824
  • 17
  • 99
  • 186
  • Welcome to SO! It would be easier to help you if you provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. – stefan Jul 14 '22 at 12:21
  • Thanks, @stefan! I have edited the original post with an examplary dataset which produces the first bar of the three bar example plots that I show. Thanks for the suggestion. – D.radiodurans Jul 14 '22 at 12:42

1 Answers1

0

You could set the categories to show up in the legend via the breaks argument of scale_fill_viridis, i.e. make a vector of all non-zero categories and pass this vector to the breaks argument.

Using a minimal reproducible example based on the ggplo2::mpg dataset.

Just to reproduce your isse the first plot shows all categories in the legend even the one for which I set n=0.

library(ggplot2)
library(viridis)
library(dplyr)

set.seed(123)

manu <- sample(unique(mpg$manufacturer), 5)
# make example data
mpg2 <- mpg |> 
  count(drv, manufacturer) |> 
  mutate(n = ifelse(manufacturer %in% manu, n, 0))

ggplot(mpg2 , aes(drv, n, fill = manufacturer)) +
  geom_col() +
  scale_fill_viridis(option="turbo",discrete="TRUE")

To drop the zero categories from the legend I pass the vector of none-zero categories to the breaks argument which as you can see will not affect the colors assigned to the categories:

ggplot(mpg2 , aes(drv, n, fill = manufacturer)) +
  geom_col() +
  scale_fill_viridis(option="turbo",discrete="TRUE", breaks = manu)

EDIT And for your "real" data you could do:


library(ggplot2)
library(viridis)
#> Loading required package: viridisLite

non_zero<- unique(t$Glycoform[t$Percentage > 0])

ggplot(t,aes(fill=Glycoform,y=Percentage,x=Plant,group=Glycoform))+
  geom_bar(stat="identity",color="white",size=1.2, width=0.8)+
  scale_fill_viridis(option="turbo",discrete="TRUE", breaks = non_zero)+
  geom_text(data=subset(t,Percentage>0),position=position_stack(vjust=0.5),color="white",size=4,aes(label=Glycoform),fontface="bold")+
  guides(fill=guide_legend(ncol=2))

stefan
  • 90,330
  • 6
  • 25
  • 51