1

I am using R to calculate the mean values of a column in a file like so:

R
file1 = read.table("x01")
mean(file1$V4)

However I have no experience building loops involving R, only with bash. How would I convert this into a loop that did this for every file in a folder and saved the output into one file with the file name and mean value as the 2 columns for each row? eg:

x01(or file1 if that is simpler) 23.4
x02 25.4
x03 10.4

etc

(Don't mind if the solution is bash and R or exclusively R) Many thanks for your help!

Current error from one of the solutions using bash and R:

Error in `[.data.frame`(read.table("PercentWindowConservedRanked_Lowest_cleanfor1000genomes_1000regions_x013",  : 
  undefined columns selected
Calls: mean -> [ -> [.data.frame
Execution halted
user964689
  • 812
  • 7
  • 20
  • 40

3 Answers3

4

This is similar to what @jmsigner has done, but with minor changes. For instance, writing to a file is done at the end. The code has not been tested.

out <- lapply(list.files(), FUN = function(x) {
    m <- mean(read.table(x, header = TRUE)$V4)
    return(m)
  })
result <- do.call("cbind", out) #merge a list column-wise
# before writing, you can make column names pretty with colnames()
# e.g. colnames(result) <- c("x01", "x02")
write.table(result, file = "means.txt")
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
  • I'd be tempted to split the file reading and the calculation of means into two steps. If the asker wants to calculate other statistics, they need the datasets, which at the moment are being discarded. – Richie Cotton Jul 02 '12 at 09:39
  • @RichieCotton I wanted to give a general idea of how to use lapply. Of course, one could, or should, tweak the function according to his or her needs. – Roman Luštrik Jul 02 '12 at 10:11
3

Assuming the columns are always named the same, you could do the following in R:

out.file <- 'means.txt'
for (i in list.files()) {
    tmp.file <- read.table(i, header=TRUE)  # Not sure if you have headers or not
    tmp.mean <- mean(tmp.file1$V4)
    write(paste0(i, "," tmp.mean), out.file, append=TRUE)
}

Or the same thing with more bash:

for i in $(ls *)
do
  mean=$(Rscript -e "mean(read.table('$i', header=T)[, 'V4'])")
  echo $i,$mean >> means.txt
done
johannes
  • 14,043
  • 5
  • 40
  • 51
  • Thanks, this makes sense to me, although it throws up Error: unexpected '}' in "}" – user964689 Jul 02 '12 at 09:38
  • I believe i missed a `)` it should be fixed now. – johannes Jul 02 '12 at 09:40
  • hmm still throws error. The more bash heavy script runs by execution is halted as undefined columns selected. Ive put the full error in my question – user964689 Jul 02 '12 at 09:46
  • You must specify your column. Since you did not provide an example dataset, I used `V4` from your sample code. Replace `[, 'V4']` either with the number of the desired column (without quotes) or the name of this column. – johannes Jul 02 '12 at 09:52
  • currently when specifying $V4 get:^XWarning message: mean() is deprecated. Use colMeans() or sapply(*, mean) instead. Thanks for the help so far ill play around with it – user964689 Jul 02 '12 at 09:55
  • ok, if you cant get it to work, consider providing an example dataset – johannes Jul 02 '12 at 09:58
  • The syntax arount the `write(paste0` looks wrong, should probably be `write(paste0(i, ",", tmp.mean), out.file, append=TRUE)`, i.e. one more comma and parenthesis closed earlier. – MvG Jul 02 '12 at 10:40
2

My solution is also similar to @jmsinger but you can specify the path to your files in the code itself and then calculate the mean like this :

filename <- system("ls /dir/",intern=TRUE)

for(i in 1:length(filename)){

file <- read.table(filename[i],header=TRUE) ## if you have headers in your files ##
mean <- mean(file$V4)

write.table(mean,file=paste("/dir",paste("mean",filename[i],sep="."),sep="/")) 
##if you wish to write the means of all the files in seperate files rather than one.
}

hope this helps

user1021713
  • 2,133
  • 8
  • 27
  • 40