I have some biological (microbiome) data, where I have a bunch of OTUs that have single names that vary in their taxonomic resolution between the genus and phylum level. I am trying to get a table of all lower level taxonomy than the name I have been given.
testnames <- c("Prevotella", "Bacteroides", "Enterobacteriaceae")
I've found taxize is a useful package for extracting the information I am looking for.
library("taxize")
reclass <- classification(testnames, db = 'ncbi')
This gets me a list of data frames
And can be entered into R as so:
structure(list(Prevotella = structure(list(name = c("cellular organisms",
"Bacteria", "FCB group", "Bacteroidetes/Chlorobi group", "Bacteroidetes",
"Bacteroidia", "Bacteroidales", "Prevotellaceae", "Prevotella"
), rank = c("no rank", "superkingdom", "no rank", "no rank",
"phylum", "class", "order", "family", "genus"), id = c("131567",
"2", "1783270", "68336", "976", "200643", "171549", "171552",
"838")), .Names = c("name", "rank", "id"), row.names = c(NA,
-9L), class = "data.frame"), Bacteroides = structure(list(name = c("cellular organisms",
"Bacteria", "FCB group", "Bacteroidetes/Chlorobi group", "Bacteroidetes",
"Bacteroidia", "Bacteroidales", "Bacteroidaceae", "Bacteroides"
), rank = c("no rank", "superkingdom", "no rank", "no rank",
"phylum", "class", "order", "family", "genus"), id = c("131567",
"2", "1783270", "68336", "976", "200643", "171549", "815", "816"
)), .Names = c("name", "rank", "id"), row.names = c(NA, -9L), class = "data.frame"),
Enterobacteriaceae = structure(list(name = c("cellular organisms",
"Bacteria", "Proteobacteria", "Gammaproteobacteria", "Enterobacterales",
"Enterobacteriaceae"), rank = c("no rank", "superkingdom",
"phylum", "class", "order", "family"), id = c("131567", "2",
"1224", "1236", "91347", "543")), .Names = c("name", "rank",
"id"), row.names = c(NA, -6L), class = "data.frame")), .Names = c("Prevotella",
"Bacteroides", "Enterobacteriaceae"))
I'd really like to turn things into a data frame that I can import into say phyloseq as a taxonomy table. Eg. something that that looks like:
name Phylum Class Order Family Genus
Prevotella Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Prevotella
Bacteroides Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae Bacteroides
Enterobacteriaceae Proteobacteria Gammaproteobacteria Enterobacterales Enterobacteriaceae
One way to do this, of course, would be to make a loop, that goes to each element of the list, finds the variable that is called, phylum, and then puts it into a new data frame. That said, I feel like there should be a faster way to apply such a transformation, using something like plyr or dplyr.
I've seen some things that seem close:
Converting nested list to dataframe
Turn a list of lists with unnamed entries into a data frame or a tibble
but they seem to assume less data that one does not want to save and evenly sized data frames for each element. Any suggestions?