0

I have a data-frame that's in the long format which consists of relative abundances of different phyla grouped by different age groups of birds. I have 44 different phyla and I wanted to plot a stacked bar plot of relative abundances for different age groups but only wanted to show the legend for the top 5/10 abundant taxa.

I've already gone through Remove legend entries for some factors levels and How do I display only selected items in a ggplot2 legend?. The second link exactly describes what I want to do but the solution to use break= parameter in scale_fill_manual() didn't work for me as it gives the following error:

Error: Insufficient values in manual scale. 44 needed but only 0 provided.

The ggplot code I used was as follows:

ggplot(df2, aes(x = variable, y = value, fill = taxa )) + 
  geom_bar(stat = "identity") +
  xlab("\nAge and Nest") +
  ylab("Relative Abund\n") +
  scale_x_discrete(limits=c('Nest','3', '6', '9', '12')) +
  scale_fill_manual(breaks=c("k__Bacteria;p__Proteobacteria",  "k__Bacteria;p__Firmicutes", "k__Bacteria;p__Actinobacteria", "k__Bacteria;p__Bacteroidetes" ,
                             "k__Bacteria;p__Tenericutes", "k__Bacteria;p__Acidobacteria", "k__Bacteria;p__Cyanobacteria", "k__Bacteria;p__Verrucomicrobia",
                             "k__Bacteria;p__Planctomycetes", "k__Bacteria;p__Chlamydia"))+
  theme_bw()

The toy data format that resembles the actual data is shown below:

taxa                           variable         value
k__Bacteria;p__Firmicutes           6             0.36
k__Bacteria;p__Acidobacteria        6             0.0025
k__Bacteria;p__Cyanobacteria        6             0.01
k__Bacteria;p__Planctomycetes       6             0.004
...                                 ...           ...
k__Bacteria;p__Acidobacteria        9             0.1025
k__Bacteria;p__Firmicutes           9             0.086
k__Bacteria;p__Planctomycetes       9             0.054
k__Bacteria;p__Cyanobacteria        9             0.017

EDIT: A reproducible example data would be like so:

df <- data.frame("taxa" = c("A", "B", "C", "D" , "D", "C", "A", "B", "A", "C", "D", "B") , "variable" = c(rep(3,4), rep(6,4), rep(9,4)), "values" = c(0.02, 0.08, 0.75,0.15,  0.08, 0.75, 0.15,0.02, 0.02, 0.02, 0.06, 0.90))
pramesh shakya
  • 143
  • 2
  • 16

1 Answers1

1

Provide all the values with the values argument, and then show selected ones with breaks. You don't give reproducible data so I'll use a reproducible example that you should be able to apply to your data:

ggplot(mtcars, aes(x = mpg, y = wt, colour = as.factor(cyl))) +
    geom_col() +
    scale_colour_manual(values = unique(mtcars$cyl), breaks = c("4","6"))

The cyl possible values are 4,6 or 8; with breaks, we are just showing 4 and 6 in the legend.

Your values argument might be something like values = unique(df$taxa).

heds1
  • 3,203
  • 2
  • 17
  • 32
  • Thank you for the reply. I found the reason why it was not working. It was because the "taxa" column was not a factor in my dataframe. Appreciate it. – pramesh shakya Oct 08 '19 at 15:03
  • So I changed your solution so that the stacked barplots would be filled with different colors and for some reason the colorscheme completely changes than what it was before, any insight on that ? The code I used was the same as yours except for `fill=as.factor(taxa)` – pramesh shakya Oct 11 '19 at 20:11