Applying an R script to multiple files

Question

I have an R script that reads a certain type of file (nexus files of phylogenetic trees), whose name ends in *.trees.txt. It then applies a number of functions from an R package called bGMYC, available here and creates 3 pdf files. I would like to know what I should do to make the script loop through the files for each of 14 species.

The input files are in a separate folder for each species, but I can put them all in one folder if that facilitates the task. Ideally, I would like to output the pdf files to a folder for each species, different from the one containing the input file.

Here's the script

# Call Tree file
trees <- read.nexus("L_boscai_1411_test2.trees.txt")

# To use with different species, substitute "L_boscai_1411_test2.trees.txt" by the path to each species tree

#Store the number of tips of the tree
ntips <- length(trees$tip.label[[1]])

#Apply bgmyc.single
results.single <- bgmyc.singlephy(trees[[1]], mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))

#Create the 1st pdf
pdf('results_single_boscai.pdf')
plot(results.single)
dev.off()

#Sample 50 trees
n <- sample(1:length(trees), 50)
trees.sample <- trees[n]

#Apply  bgmyc.multiphylo  
results.multi <- bgmyc.multiphylo(trees.sample, mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))

#Create 2nd pdf
pdf('results_boscai.pdf')     # Substitute 'results_boscai.pdf' by "*speciesname.pdf"
plot(results.multi)
dev.off()

#Apply bgmyc.spec and spec.probmat 
results.spec <- bgmyc.spec(results.multi)
results.probmat <- spec.probmat(results.multi)

#Create 3rd pdf
pdf('trees_boscai.pdf')     # Substitute 'trees_boscai.pdf' by "trees_speciesname.pdf"
for (i in 1:50) plot(results.probmat, trees.sample[[i]])
dev.off()

I've read several posts with a similar question, but they almost always involve .csv files, refer to multiple files in a single folder, have a simpler script or do not need to output files to separate folders, so I couldn't find a solution to my specific problem.

Shsould I use a for loop or could I create a function out of this script and use lapply or another sort of apply? Could you provide me with sample code for your proposed solution or point me to a tutorial or another reference?

Thanks for your help.

This can be done using `list.dirs`/`list.files`, which are usually used in combination with `lapply`. For outputting to different directories you should know that you can inclute paths (absolute or relative) in file names and that those are of course character strings which you can construct from several character strings using `paste`. A **minimal** example would make it easier to write a full answer. Most of your code is irrelevant for your problem, — Roland, Aug 13 '14 at 14:45
You're right, I just found the guidelines on creating a minimal example. The reason I posted almost the entire script was because, in a [similar question](http://stackoverflow.com/questions/9982269/applying-r-script-prepared-for-single-file-to-multiple-files-in-the-directory), the OP posted quite a large script as well, so I thought it was ok. I'll make sure to not to do this mistake in the future. — RicardoC, Aug 13 '14 at 15:30

score 1 · Accepted Answer · edited May 23 '17 at 11:53

It really depends on the way you want to run it. If you are using linux / command line job submission, it might be best to look at

How can I read command line parameters from an R script?

If you are using GUI (Rstudio...) you might not be familiar with this, so I would solve the problem as a function or a loop.

First, get all your file names.

files = list.files(path = "your/folder")
# Now you have list of your file name as files. Just call each name one at a time
# and use for loop or apply (anything of your choice)

And since you would need to name pdf files, you can use your file name or index (e.g loop counter) and append to the desired file name. (e.g. paste("single_boscai", "i"))

In your case,

 files = list.files(path = "your/folder")
    # Use pattern = "" if you want to do string matching, and extract
    # only matching files from the source folder.

    genPDF = function(input) {
      # Read the file
      trees <- read.nexus(input)
      # Store the index (numeric)
      index = which(files == input)

      #Store the number of tips of the tree
      ntips <- length(trees$tip.label[[1]])

      #Apply bgmyc.single
      results.single <- bgmyc.singlephy(trees[[1]], mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))

      #Create the 1st pdf
      outname = paste('results_single_boscai', index, '.pdf', sep = "")
      pdf(outnam)
      plot(results.single)
      dev.off()

      #Sample 50 trees
      n <- sample(1:length(trees), 50)
      trees.sample <- trees[n]

      #Apply  bgmyc.multiphylo  
      results.multi <- bgmyc.multiphylo(trees.sample, mcmc=150000, burnin=40000, thinning=100, t1=2, t2=ntips, start=c(1,1,ntips/2))

      #Create 2nd pdf
      outname = paste('results_boscai', index, '.pdf', sep = "")
      pdf(outname)     # Substitute 'results_boscai.pdf' by "*speciesname.pdf"
      plot(results.multi)
      dev.off()

      #Apply bgmyc.spec and spec.probmat 
      results.spec <- bgmyc.spec(results.multi)
      results.probmat <- spec.probmat(results.multi)

      #Create 3rd pdf
      outname = paste('trees_boscai', index, '.pdf', sep = "")
      pdf(outname)     # Substitute 'trees_boscai.pdf' by "trees_speciesname.pdf"
      for (i in 1:50) plot(results.probmat, trees.sample[[i]])
      dev.off()
    }


    for (i in 1:length(files)) {
      genPDF(files[i])
    }

Thank you very much for your careful answer. I have just one question regarding the naming of pdf files. The "boscai" in the name refers to a particular species, so I would like to replace that by whatever species is being analyzed. Every tree file has a long name, but they all start with something like "Species.name_*". How can I extract this string, and tell the paste function to use it? — RicardoC, Aug 13 '14 at 15:44
It really depends on the full file name. You would need to do the string processing, and there are multiple ways to tackle one problem. For instance, if file name only contained one "_", I would split it by "_" and use remainder of the string without file extension (remove after "."). — won782, Aug 13 '14 at 15:47
I think something went wrong with the formatting of your comment, I can't understand your example. A full file name looks like this `C.lusitanica_haps_logrelax_Coconst_100mill.trees.txt`. The "C.lusitanica" is the species name. I would like the pdf's to look like `results_single_C.lusitanica.pdf`. So basically I want to extract the part before the underscore, and paste it to the pdf filename. What do you think would be the quickest/simplest way of doing this? — RicardoC, Aug 13 '14 at 15:59
`a = "C.lusitanica_haps_logrelax_Coconst_100mill.trees.txt"` `strsplit(a, "[_]")[[1]][1]` This returns your desired result. — won782, Aug 13 '14 at 16:06
That did the job. I now have my script up and running. Thanks! — RicardoC, Aug 13 '14 at 16:43

Applying an R script to multiple files

1 Answers1