0

I have written an R script that is to be used as part of a shell script based pipeline which will feed dozens of files containing genetic sequence data to the R script one after the other (using args[]).

I am having trouble finding a way to write the results of each run of this script to a single results file. I thought that the easiest way to do this might be to create an empty results.csv table and then ask the script to write to the next row of this file each time it is run (saves the problem of the script writing straight over the file on each run). In this vein a friend helped me out with the following code:

x<-readLines("results.csv")

if(x[[1]]==""){x[[1]]<-paste("meancoscore", "meanboot", "CIres", "RIres", "RC",  "nodecount", sep= ",")}

x[[length(x)+1]]<-paste(meancoscore, meanboot, CIres, RIres, RC, nodecount, sep = ",")
x<-data.frame(x)
write.table(x,"results.csv", row.names = F, col.names = F, sep = ",")

In the above code "meancoscore", "meanboot", "CIres", "RIres", "RC", and "nodecount" are first used as a header if the data frame has nothing on the first row.

Following this the results (objects: meancoscore, meanboot, CIres, RIres, RC and nodecount are written in the columns corresponding with their headers. The idea here is that if you run the R script again with different source files it should simply write the results to the next line in the results.csv file.

However, the following is seen in the results.csv file after three runs of this code with different input files:

"\""\\""meancoscore,meanboot,CIres,RIres,RC,nodecount\\""\""
""\""\\""0.000,76.3247863247863,0.721002252252252,0.983235214508053,0.708914804154032,117\\""\""
""\""0.845,77.6923076923077,0.723259762308998,0.983410513459875,0.711261254217159,117\""
""0.85,77.4358974358974,0.728886344116805,0.983878381369061,0.717135516451654,117"

Where my desired result would be the following:

meancoscore,meanboot,CIres,RIres,RC,nodecount
0.000,76.3247863247863,0.721002252252252,0.983235214508053,0.708914804154032,117
0.845,77.6923076923077,0.723259762308998,0.983410513459875,0.711261254217159,117
0.85,77.4358974358974,0.728886344116805,0.983878381369061,0.717135516451654,117

It is worth noting that each successive fun seems to be adding more backslashes and more quotation marks to the results.csv file.

Ideally I would like to be able to simply read in the results.csv file when it is done and analyse the data by accessing the columns with results$meanboot, or summary(results$meanboot) for example.

Could anyone offer some advice on how to modify the above code or offer an alternative solution?

I should add here that I purposefully did not go for the option of writing into the R script a loop that will run through the input files of interest and simply assemble a full table of results as an object (I am aware that this would be very simple to write out). This was because the work being done by this script will be farmed out to multiple machines in a cluster.

Thank you for your time and any help you might be able to offer.

user2439887
  • 61
  • 1
  • 11

1 Answers1

0

The problem was solved by adding quote = FALSE to the write.table() call as per voidHead's suspicion.

user2439887
  • 61
  • 1
  • 11