I have a set of data, it is the populations of 6 different cell types within liver samples. I want to compare this set of data against another set of properties - genetic features of the patient the sample came from. So for example, there would be three different variations of gene 1, and I need to make a boxplot graph with cell number on the y axis and cell type on the x axis, and each cell type has three boxplots on it, one for each gene variation. I have a script that produces this, but if I want to check how the cell populations compare against gene 2 variations, I have to rewrite the script to replace "gene1" with "gene2".
I would like the script to make graphs for all the genes automatically without me having to rewrite it for each gene. I thought a good way to do this is to make a list of genes, and then a for loop. The for loop would contain my earlier script that works, and for every item in the list it would produce a graph.
Here's what currently works, making them one at a time:
# fetches all data from excel file
Alldata.Table = read_excel("E:/data/datafile.xlsx", sheet = "sheet1")
#selects cell number data, normalizes cell numbers to a reference cell number
celldata.Table.Normalized <- as.data.frame(apply(Alldata.Table[, c(34:39)], 2, function(x) {x/Alldata.Table[,33]}))
colnames(celldata.Table.Normalized) <- colnames(Alldata.Table[, c(34:39)])
celldata.Table.Long <- pivot_longer(cell.Table.Normalized, cols = colnames(celldata.Table.Normalized))
#adds anonymous patient number to cell data, there are 6 cell types so each patient number is repeated 6 times.
#This information doesn't appear on the graph but I used it to check the data was being moved around correctly
#and the cell number results match the patient samples as they do in the original table
celldata.Table.Long$patient <- rep(Alldata.Table$`PatientNo`, each = 6)
#adds gene variant information
celldata.Table.Long$Gene1 <- rep(Alldata.Table$`Gene1`, each = 6)
#makes boxplot
q <- ggplot(celldata.Table.Long, aes(x = name, y = value, fill = Gene1)) +
geom_boxplot()+
xlab("Gene 1 variant")+
ylab("Relative proportions")+
theme_bw()
q + stat_compare_means(aes(group = Gene1), label.y = .5, label="p.signif")
Here's how I tried to add gene variant information with the list and for loop:
genelist <- list("Gene1", "Gene2", "Gene3")
for (i in seq_along(genelist)) {
celldata.Table.Long$genelist[i] <- rep(alldata.Table$genelist[i], each = 6)
q <- ggplot(celldata.Table.Long, aes(x = name, y = value, fill = genelist[i])) +
geom_boxplot()+
xlab("Gene variant")+
ylab("Relative proportions")+
theme_bw()
q + stat_compare_means(aes(group = Gene1), label.y = .5, label="p.signif")
}
The first problem is whereas it "$Gene1" is recognised as the column for gene 1, "genelist[1]" is not recognized as that, even though genelist[1] corresponds to "Gene1". (I get the error "Unknown or uninitialised column: genelist
"). I can't find a way to fix this and make it work. If it can be made to work I will try to make the script produce a png of each graph.