0

I'm very new to R and just wrote this to obtain the mean for a number of timeseries in one file:

compiled<-read.table("/Users/Desktop/A/1.txt", header=TRUE)

z<-ncol(compiled)

comp_df<-data.frame(compiled[,2:z])

indmean<- rowMeans(comp_df)

The data in each file looks something like this:

Time A1 A2 A3 A4 A5

1 0.1 0.2 0.1 0.2 0.3


2 0.2 0.3 0.4 0.2 0.3

...

It works fine but I am hoping to apply this to many files of the same nature, with varying numbers of timeseries in each file. If anyone could advise on how I can improve the above to do so, it would be great. Thank you in advance!

joran
  • 169,992
  • 32
  • 429
  • 468
Tan
  • 45
  • 1
  • 5
  • This has been addressed on SO number of times. e.g. http://stackoverflow.com/questions/3764292/loading-many-files-at-once http://stackoverflow.com/questions/5758084/loop-in-r-loading-files http://stackoverflow.com/questions/6420207/manipulating-multiple-files-in-r – Roman Luštrik Jul 15 '11 at 10:09
  • @Roman - just realized I answered that last link, and it was from the same OP... – Chase Jul 15 '11 at 21:57

1 Answers1

3

You can steps you've outlined above - roll them up into a function, and the lapply them over a vector that contains the names of the files you want to do this analysis on. Depending on what you need to do, splitting the reading of the data in from the subsequent analysis may or may not make sense so that you can keep the data in your working environment. For the sake of simplicity, I'm going to assume you don't need the data afterwords.

The general steps will be:

1) Create a vector of your files to be processed. Something like:

filesToProcess <- dir(pattern = "yourPatternHere")

2) Turn your code above into a function

FUN <- function(dat){   
  compiled<-read.table(dat, header=TRUE)
  z<-ncol(compiled)
  comp_df<-data.frame(compiled[,2:z])
  indmean<- rowMeans(comp_df)
  return(indmean)
}

3) lapply the FUNction to your list of files and assign a new variable:

out <- lapply(filesToProcess, FUN)

4) Give out some names so you know what goes to what:

names(out) <- filesToProcess

You now have a named list that contains the rowMeans for all files you listed in filesToProcess.

Chase
  • 67,710
  • 18
  • 144
  • 161
  • Hi Chase, the output file I get is empty. This is what I did: `filesToProcess <- dir("/Users//Desktop/A/.txt") FUN <- function(dat){ compiled<-read.table(dat, header=TRUE) z<-ncol(compiled) comp_df<-data.frame(compiled[,2:z]) indmean<- rowMeans(comp_df) return(indmean) } out <- lapply(filesToProcess, FUN) names(out) <- filesToProcess write.table(out, file="/Users//Desktop/A/compiledmeans.txt", sep="\t", row.names=FALSE)` Did I do the pattern wrongly? Thanks! – Tan Jul 15 '11 at 04:52
  • @Tan: Use `dir("/Users//Desktop/A", pattern = glob2rx("*.txt"))`. – Richie Cotton Jul 15 '11 at 08:13
  • Also note that `2:z` will not be what you want if `z` is 1. Consider swapping it for `seq_len(z)[-1]`. – Richie Cotton Jul 15 '11 at 08:15
  • @Tan - Richie is right. Alternatively, use `setwd()` to navigate to the directory that contains your files. The help page for `?dir` should be illustrative. – Chase Jul 15 '11 at 11:30
  • @Chase- I tried `setwd()` and the following error is shown: `> out <- lapply(filesToProcess, FUN) Error in read.table(dat, header = TRUE) : no lines available in input > names(out) <- filesToProcess Error in names(out) <- filesToProcess : object 'out' not found > write.table(out, file="/Users/tan/Desktop/A/compiledmeans.txt", sep="\t", row.names=FALSE) Error in inherits(x, "data.frame") : object 'out' not found > ` Not sure where I went wrong... Thank you so much for helping to troubleshoot! – Tan Jul 17 '11 at 02:39
  • @Tan - the error indicates that you probably didn't get the directory set properly, which resulted in no files to be read in, which led to the subsequent errors... What is the result of `getwd()`? Is the directory set as you think it should be? Once you get that right, look at `dir()`. Assuming you can navigate to the right directory, you should see the files of interest come up with `dir()`. As an aside, I'd recommend running one line at a time and ensuring that it runs properly before running the next line so you can isolate where the problem is happening. – Chase Jul 17 '11 at 02:44