Reading multiple data files and passing it into a function to plot

Question

I have multiple files to plot as volcano plot. All my files in the folder.

Objective I would like to read them as list of files and then pass them into the function to plot for each data or files.

The function which I would like to use is this

EnhancedVolcano(res1,lab = rownames(res1),x = "log2FoldChange",y = "padj",
                #selectLab = c("APOBEC3B","CHD7","AURKB","EYA1","UHRF1","SFMBT1"),
                xlim = c(-8, 8),
                xlab = bquote(~Log[2]~ "fold change"),
              ylab = bquote(~-Log[10]~adjusted~italic(P)),
                transcriptPointSize = 10,
                transcriptLabSize = 10,
              border = "full",
              pCutoff = 0.05,
              #legendPosition = "bottom",
              borderWidth = 1.5,
              legend=c('NS','Log2 FC','Adjusted p-value',
                       'Adjusted p-value & Log2 FC'),
              legendPosition = 'bottom',
              legendLabSize = 20,
              legendIconSize = 20,
              borderColour = "blue",
              #drawConnectors = FALSE,
              #widthConnectors = 0.01,
              colConnectors = 'grey30',
              gridlines.major = FALSE,
              gridlines.minor = FALSE)

The is the list of files which I intend to use

M0_vs_M1_TCGA_stages.txt  M0_vs_M4_TCGA_stages.txt  M1_vs_M3_TCGA_stages.txt  M2_vs_M3_TCGA_stages.txt  M3_vs_M4_TCGA_stages.txt
M0_vs_M2_TCGA_stages.txt  M0_vs_M5_TCGA_stages.txt  M1_vs_M4_TCGA_stages.txt  M2_vs_M4_TCGA_stages.txt  M3_vs_M5_TCGA_stages.txt
M0_vs_M3_TCGA_stages.txt  M1_vs_M2_TCGA_stages.txt  M1_vs_M5_TCGA_stages.txt  M2_vs_M5_TCGA_stages.txt  M4_vs_M5_TCGA_stages.txt

The general structure of each of my dataframe is like this

a <- dput(head(M0_vs_M1_TCGA_stages))
structure(list(gene = c("ENSG00000000003", "ENSG00000000971", 
"ENSG00000002726", "ENSG00000003989", "ENSG00000005381", "ENSG00000006534"
), Symbol = c("TSPAN6", "CFH", "AOC1", "SLC7A2", "MPO", "ALDH3B1"
), baseMean = c(18.692748982067, 464.265236194545, 109.22179823167, 
85.504528879087, 225281.306485184, 3135.38237206618), log2FoldChange = c(1.72011856334064, 
-1.84102137729838, -1.90294968540377, -2.38723703218791, -4.71693379158602, 
-1.50626419101949), lfcSE = c(0.521825206121688, 0.528072294508922, 
0.539428712863011, 0.661673608593429, 0.523148071429431, 0.26205630469554
), stat = c(3.29635008650678, -3.48630556164743, -3.52771300456717, 
-3.60787705778782, -9.0164411362497, -5.74786472994606), pvalue = c(0.00097949874464195, 
0.00048974125782849, 0.00041916635977159, 0.00030871270363637, 
1.94298755192739e-19, 9.03774951656819e-09), padj = c(0.0133044251543343, 
0.00833058768185816, 0.00750903801425802, 0.00609902023132708, 
3.7330619835181e-15, 3.94641548776874e-06), UP_DOWN = c("UP", 
"DOWN", "DOWN", "DOWN", "Low", "DOWN")), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

So for each file or each dataset I would like to pass them to the above function and print them as individual plot and retain the name of the in the plot except.

Any suggestion or help I would really appreciate.

My attempt so far

 make_volcano <- function(df){
      ggmaplot(df, main = expression("Group 1" %->% "Group 2"),
               fdr = 0.05, fc = 1, size = 0.4,
               palette = c("#B31B21", "#1465AC", "darkgray"),
               genenames = as.vector(df$Symbol),
               legend = "top", top = 0,
               font.label = c("bold", 11),
               font.legend = "bold",
               font.main = "bold",
               ggtheme = ggplot2::theme_minimal())
    }
    
    plots <- lapply(all_csv, make_volcano)

This does what i need it was not so complicate i need to figure out how to save the plot with respective file name

Improved version of my answer

bb <- all_csv


plot_list = list()
for (i in seq(length(bb))) {
  p = make_volcano(bb[[i]])
  plot_list[[i]] = p
}


pdf("MAPLOT.pdf",height = 10,width = 15)

for (i in seq(length(bb))) {
  print(plot_list[[i]])
}
dev.off()

Only thing I need to add put each list element name into the plot in order to identify although they are being plotted in order

very related, if not duplicate https://stackoverflow.com/questions/9564489/read-all-files-in-a-folder-and-apply-a-function-to-each-data-frame — tjebo, Jun 07 '22 at 06:44
i saw that post but I was not sure about the passing the list of files to my above function and print them as different plots — PesKchan, Jun 07 '22 at 06:48
https://stackoverflow.com/questions/66038622/using-apply-function-on-a-list-to-get-plots https://stackoverflow.com/questions/67647284/how-to-use-list-elements-as-plot-title-in-r https://stackoverflow.com/questions/64632681/creating-a-list-of-plots-with-map https://stackoverflow.com/questions/62457314/creating-a-list-of-plots-using-a-for-loop — tjebo, Jun 07 '22 at 06:51
these are just four examples on a quick google. There is a lot of material on that question here. — tjebo, Jun 07 '22 at 06:52
thank you for the links I was not exactly able to make the query to get those you now.. — PesKchan, Jun 07 '22 at 06:52
Sometimes one is stuck. I've removed the downvote. You'll learn much more trying to come to the solution yourself. However, if those threads don't help, please give us a shout. — tjebo, Jun 07 '22 at 06:54
"threads don't help, please give us a shout" i would say most of my PhD analysis work i learnt from stack given that i a biologist, the main issue is implementation although I might have seen them but not sure how to use them in my case...thanks for the links+query — PesKchan, Jun 07 '22 at 06:57
@tjebo wow you are an `ophthalmologist` ..................and you have a package in cran — PesKchan, Jun 07 '22 at 06:58

tjebo · Accepted Answer · 2022-06-07T12:11:01.130

I am not on my computer and don't have R available, thus this answer is more general and should just give an idea of the principle.

You seem to have solved the problem to read in the list of files and already have the list of data sets. And you have your plotting function. Well done.

I personally prefer the "apply" family for looping, because it is slightly shorter code, I find it easier to read, and also comes with less (i.e., no) danger of "growing your vectors". (see also Burn's famous R inferno, chapter 2).

in your case, you could therefore simply write

## lapply returns a list
lapply(all_csv, make_volcano)

Which will create the list of plots. You have now several options to save them. You could print them on one plot, easiest with the patchwork package:

plots <- lapply(all_csv, make_volcano)
patchwork::wrap_plots(plots)

If you want to create separate files, your approach is perfectly fine. Another option might be to use ggsave, here again with lapply. You can specify arguments in lapply itself.

lapply(plots, ggsave, width = 15, device = "pdf")

Naming is a bit trickier and certainly depends largely on the structure of your data set list. Is it a named list? What do you get when calling names(all_csv)?

You can use the names for the titles, as shown in this thread. This is also not the only thread on that topic, it is actually a farily common problem here on stackoverflow. The general idea is to loop over both list and names and assign the respective name to the plot - this can be achieved via indexing or with the use of parallel looping functions such as mapply or purrr::map2. I generally like looping over indexes for those cases. You could for example do:

lapply(1:length(all_csv), function(i){
make_volcano(all_csv[[i]] +
## I am here assuming that ggmaplot returns a ggplot object to which you can add a
## ggtitle layer - not sure if this really works. But hopefully you get the idea
ggtitle(names(all_csv)[i])
})

The same idea of looping over indexes of your names should work with ggsave, and you will get filenames that are like the read-in data files.

lapply(1: length(plots), function(i){
ggsave(plot = plots[[i]], 
       filename = paste(names(plots)[i], ".pdf"), 
        width = 15)
})

thank you for the elaborate answer now I can automate lots of stuff for plotting which i have lots figures to make in one go. — PesKchan, Jun 07 '22 at 12:53
This i get when I try `names(all_csv)` ` names(all_csv) [1] "M0_vs_M1_TCGA_stages" "M0_vs_M2_TCGA_stages" "M0_vs_M3_TCGA_stages" "M0_vs_M4_TCGA_stages" "M0_vs_M5_TCGA_stages" "M1_vs_M2_TCGA_stages" [7] "M1_vs_M3_TCGA_stages" "M1_vs_M4_TCGA_stages" "M1_vs_M5_TCGA_stages" "M2_vs_M3_TCGA_stages" "M2_vs_M4_TCGA_stages" "M2_vs_M5_TCGA_stages" [13] "M3_vs_M4_TCGA_stages" "M3_vs_M5_TCGA_stages" "M4_vs_M5_TCGA_stages"` — PesKchan, Jun 07 '22 at 12:57
`lapply(1: length(plots), function(i){ ggsave(plot = plots[[i]], filename = paste(names(plots)[i], ".pdf"), width = 15) })` this one is better or I would say best since its saves separate file which with their name into it..now I dont have to worry about putting title on each plot which I thought of doing when I was printing all of them into a single pdf. — PesKchan, Jun 07 '22 at 13:06

Reading multiple data files and passing it into a function to plot

1 Answers1