I have some code that displays the abundance of phyla, and genus within that phyla, as a stacked bar graph. I edited the code such that all the NA elements appear at the top of each bar and the more abundant elements appear at the bottom, however, this threw off my color palette which assigned the colors based on phyla group, and within that group by alphabet. For example, the Bacteriodetes phyla were assigned the color blue with each genus within the phyla being assigned a shade of blue in alphabetical order.
I believe I can change the levs variable to sort the elements alphabetically and grouped by phyla, but I haven't figured out a way to do that. Currently, however, the levs variable sorts the elements by abundance which is something I want to keep.
#makes color pallete
ColourPalleteMulti <- function(df, group, subgroup){
# Find how many colour categories to create and the number of colours in each
categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
category.end <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
# Build Colour pallette
colours <- unlist(lapply(1:nrow(categories),
function(i){
colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
return(colours)
}
library(tidyverse)
library("phyloseq"); packageVersion("phyloseq")
library(ggplot2)
library(scales)
library(RColorBrewer)
data("GlobalPatterns")
#filter phyloseq data
TopNOTUs <- names(sort(taxa_sums(GlobalPatterns), TRUE)[1:100])
gp.ch <- prune_species(TopNOTUs, GlobalPatterns)
#create dataframe
mdf = psmelt(gp.ch)
mdf$group <- paste0(mdf$Phylum, "-", mdf$Genus, sep = "")
#factor by abundance
levs <- names(sort(tapply(mdf$Abundance, mdf$Genus, sum)))
#load colors
colours <- ColourPalleteMulti(mdf, "Phylum", "Genus")
#put NA at the top
mdf %>%
mutate(Genus = fct_explicit_na(Genus, "NA"),
Genus = factor(Genus, levels = c("NA", levs))) %>%
#graph
ggplot(aes(Phylum)) +
geom_bar(aes(fill = Genus), colour = "grey", position = "stack") +
scale_fill_manual("Genus", values=c("#FFFFFF",colours)) +
ggtitle("Phylum and Genus Frequency") +
ylab("Frequency") +
theme(plot.title = element_text(hjust = 0.5))
Running this code reveals a bar graph with colors in odd places. Ideally, each bar in the graph will be a primary color with each stack being a different shade of the color. The color palette is being created correctly, but the colors are assigned incorrectly because of the aforementioned issues. Any help is appreciated!