0

I have a set of data, it is the populations of 6 different cell types within liver samples. I want to compare this set of data against another set of properties - genetic features of the patient the sample came from. So for example, there would be three different variations of gene 1, and I need to make a boxplot graph with cell number on the y axis and cell type on the x axis, and each cell type has three boxplots on it, one for each gene variation. I have a script that produces this, but if I want to check how the cell populations compare against gene 2 variations, I have to rewrite the script to replace "gene1" with "gene2".

I would like the script to make graphs for all the genes automatically without me having to rewrite it for each gene. I thought a good way to do this is to make a list of genes, and then a for loop. The for loop would contain my earlier script that works, and for every item in the list it would produce a graph.

Here's what currently works, making them one at a time:

# fetches all data from excel file
Alldata.Table = read_excel("E:/data/datafile.xlsx", sheet = "sheet1")

#selects cell number data, normalizes cell numbers to a reference cell number
celldata.Table.Normalized <- as.data.frame(apply(Alldata.Table[, c(34:39)], 2, function(x) {x/Alldata.Table[,33]}))
colnames(celldata.Table.Normalized) <- colnames(Alldata.Table[, c(34:39)])

celldata.Table.Long <- pivot_longer(cell.Table.Normalized, cols = colnames(celldata.Table.Normalized))


#adds anonymous patient number to cell data, there are 6 cell types so each patient number is repeated 6 times. 
#This information doesn't appear on the graph but I used it to check the data was being moved around correctly 
#and the cell number results match the patient samples as they do in the original table

celldata.Table.Long$patient <- rep(Alldata.Table$`PatientNo`, each = 6)

#adds gene variant information
celldata.Table.Long$Gene1 <- rep(Alldata.Table$`Gene1`, each = 6)


#makes boxplot
q <- ggplot(celldata.Table.Long, aes(x = name, y = value, fill = Gene1)) +
  geom_boxplot()+ 
  xlab("Gene 1 variant")+
  ylab("Relative proportions")+
  theme_bw()

 q  + stat_compare_means(aes(group = Gene1), label.y = .5, label="p.signif")

Here's how I tried to add gene variant information with the list and for loop:

genelist <- list("Gene1", "Gene2", "Gene3")

for (i in seq_along(genelist)) { 
  celldata.Table.Long$genelist[i] <- rep(alldata.Table$genelist[i], each = 6)
  
q <- ggplot(celldata.Table.Long, aes(x = name, y = value, fill = genelist[i])) +
  geom_boxplot()+ 
  xlab("Gene variant")+
  ylab("Relative proportions")+
  theme_bw()

 q  + stat_compare_means(aes(group = Gene1), label.y = .5, label="p.signif")
}

The first problem is whereas it "$Gene1" is recognised as the column for gene 1, "genelist[1]" is not recognized as that, even though genelist[1] corresponds to "Gene1". (I get the error "Unknown or uninitialised column: genelist"). I can't find a way to fix this and make it work. If it can be made to work I will try to make the script produce a png of each graph.

Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    Can you post sample data? Please the question with the output of `dput(head(celldata.Table.Long, 30))`. Also, are you looking for [grouped boxplot r ggplot2](https://stackoverflow.com/a/60921716/8245406)? – Rui Barradas Aug 16 '20 at 14:27
  • give a lookup https://www.reed.edu/data-at-reed/resources/R/loops_with_ggplot2.html – Seyma Kalay Aug 16 '20 at 14:49
  • 1
    Instead of `celldata.Table.Long$genelist[i]`, you will need to use `celldata.Table.Long[, genelist[i]]`. In general, using `$` to select a column works interactively, or when your column name never changes. For looping over a vector of column names, use `[, ]` or `[[]]`. For example: `col_name <- "mpg"; mtcars$col_name; mtcars[, col_name]; mtcars[[col_name]]`. The first attempt `mtcars$col_name` fails. The other two work as expected. – bdemarest Aug 16 '20 at 16:56
  • @StuckwithR: these might help https://stackoverflow.com/a/50522928/ & https://stackoverflow.com/a/50930640/ – Tung Aug 17 '20 at 00:40

0 Answers0