1

I'm creating a stacked bar plot of relative abundance data, but I only want to display selected few interesting taxa in the legend. I have tried using scale_fill_manual(values = sample(col_vector), breaks = legend_list). The result only shows selected legend in my legend_list as I wish, but all other factors shows no color. How do I show all the colors as stacked bar plot, but only show legend for factors in legend_list?

My code:

ggplot(df, aes_string(x = x, y = y, fill = fill)) +
        geom_bar(stat="identity", position="stack") +
        scale_fill_manual(values = sample(col_vector), 
                          breaks = legend_list) +
        theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
Max Qiu
  • 13
  • 2
  • What colors do you want the other taxa to have? Your interesting taxa are going to be very difficult to spot amongst all the other colors. – Allan Cameron Jun 17 '22 at 15:11
  • So I created a ```col_vector``` that is longer than the number of taxa I had, so I can sample this color vector to show color for each taxa. Yes, it will be hard to spot, but I do want all other taxa to have their own color, but only selected taxa in my ```legend_list``` will get to show their legend. Here is a link to this [image without legend](https://drive.google.com/file/d/1apwUQ06-foDB3mhmTaiBM2mgF_NRavtL/view?usp=sharing) – Max Qiu Jun 17 '22 at 15:20
  • From a data visualisation perspective, it is honestly completely pointless to have any kind of color legend on that plot. There are just too many colors to make it interpretable. A plot should exist to demonstrate a feature of your data. At the moment, all it shows is the complexity of your data. This is occasionally a useful point to make, but if you want a few taxa to be highlighted in relation to the others, then drowning them in a sea of colors isn't going to achieve that, whatever your legend shows. – Allan Cameron Jun 17 '22 at 15:33
  • I understand your point, but that's not what I am asking. I am just curious from a technical stand point, what am I missing here? Why doesn't it show all the other colors? – Max Qiu Jun 17 '22 at 15:48

1 Answers1

0

The reason for your issue is most likely that you are using an unnamed vector of colors. Under the hood, when you pass a vector to the breaks argument this vector is used as names for the color vector passed to the values argument. However, when the number of breaks is smaller than the number of colors only some colors get named while the names for all other colors are set to NA. As a consequence only the categories specified via the breaks argument are assigned a fill color while all other categories are assigned the na.value which by default is "grey50".

As your provided no minimal reproducible example I use a basic example based on the ggplot2::mpg dataset to first try to reproduce you issue before offering a fix:

library(ggplot2)
library(dplyr)

base <- ggplot(mpg, aes(class, fill = manufacturer)) +
  geom_bar() 

legend_list <- c("audi", "volkswagen")

col_vector <- scales::hue_pal()(n_distinct(mpg$manufacturer))

base + scale_fill_manual(values = sample(col_vector), breaks = legend_list)

One option to fix the issue is to use a named vector of colors. Doing so all categories are assigned a color but only the categories specified via the breaks argument will show up in the legend:

names(col_vector) <- sample(unique(mpg$manufacturer))

base + scale_fill_manual(values = col_vector, breaks = legend_list)

stefan
  • 90,330
  • 6
  • 25
  • 51
  • 1
    Thank you so much!! I named the ```col_vector``` like you suggested, and it worked just as I imagined! And thanks for providing information about reproducible example! Will follow that next time. – Max Qiu Jun 17 '22 at 19:44