I created my first function and I'm super proud of myself, but I'm trying to make it better. It goes through an abundance table, identifies the most abundant column in each row, and then gives me a name for that column that correlates with another data table, This is being done with objects created with the phyloseq package.
find.top.phyla <- function(x){
require(phyloseq)
otu <- otu_table(x)
tax <- tax_table(x)
j<-apply(otu,1,which.max)
k <- j[!duplicated(j)]
l <- data.frame(tax[k,])
m <- data.frame(otu[,k])
colnames(m) <- l$Phylum
n <- colnames(m)[apply(m,1,which.max)]
m$TopPhyla <- n
return(m)
}
find.top.phyla(top.pdy.phyl)
This gives me
Proteobacteria Actinobacteria Bacteroidetes TopPhyla
S1 45 25 10 Proteobacteria
S2 14 35 5 ActinoBacteria
S3 88 19 400 Bacteroidetes
To make it more useful, I would like to tell it exactly which taxon level I want and spit out another table with the taxonomy on top with the appropriate abundance in the data frame and the most abundant taxon identified for each row identified in the data frame. As exhibited above.
find.top.taxa <- function(x,taxa){
require(phyloseq)
top.taxa <- tax_glom(x, taxa)
otu <- otu_table(top.taxa)
tax <- tax_table(top.taxa)
j<-apply(otu,1,which.max)
k <- j[!duplicated(j)]
l <- data.frame(tax[k,])
m <- data.frame(otu[,k])
s <- as.name(taxa) # This is Where the issue is occuring
colnames(m) <- l$make.names(taxa) # This is Where the issue is occuring
n <- colnames(m)[apply(m,1,which.max)]
m$make.names(taxa) <- n # This is Where the issue is occuring
return(m)
}
I've identified where the issues are coming up. I've tried "is.name", "as.name", "taxa" (which it really doesn't like), and a few other iterations. Essentially, I would like to make the "taxa" argument into a variable string and identify the column that's in the other table with the with the column identical to the "taxa" argument. i.e: find.top.taxa(top.pdy, "Class")
and/or find.top.taxa(top.pdy, "Genus")