-3

I'm trying to learn R, and I'm having trouble accomplishing my current task at hand, and I thought someone might have some insight or suggestions to help me think through this logically.

I have a directory with multiple CSV files, each file represents a separate day of ecological measurements. Each day (file) the measurements/variables are the same, so each CSV has the same headings but contains hundreds of unique observations for each variable.

I'm trying to write a small script that:

reads the list of files in the directory, loads each file one by one while taking the mean of one specific column and then storing that mean and the associated date in a new data frame

I then want to plot the date and mean, to see how the mean value is changing over time.

Any suggestions on how to best accomplish this?

Here is my working attempt:

dir <- getwd()
file.ls <- list.files(dir, full.names = T)
count <- length(file.ls)
all.means <- data.frame()
data <- data.frame()
for(i in 1:count){
   data <- read.csv(file.ls[i])
   date <- data[2,1]
   means <- mean(data$total_con)
   all.means[i] <- cbind(all.means, date, means)
}

plot(all.means$date, all.means$means)
peakgeek
  • 1
  • 1
  • 1
    You should start by looking at `?list.files` and `?read.csv` and incorporating them into your code. Then if you get stuck and cannot figure out a solution, you could update your question with a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of the issue, i.e. data, code, and a description of the problem you are having. – nrussell Jul 16 '14 at 20:30

1 Answers1

0

The missing ingredient to your question is "How does each file tell you what date it goes with?" I'll assume you have some naming convention like mydata_yyyy-mm-dd.csv

The following code can be adapted to work:

library(plyr)  # provides ldply
data.file.names <- dir(pattern="^mydata") # reads just the data files
X <- ldply(data.file.names, function(fn) {
  dat <- read.csv(fn)  # read the file
  this.date <- strptime(substring(fn, 8, 17), "%Y-%m-%d")  # parse the date from the file name
  this.mean <- mean(dat[,n.col.of.interest])  # calculate the stat of interest
  return(data.frame(date.of.experiment=this.date, measurement=this.mean))  # return one row
})

Then you can plot or otherwise use the data.

polimath
  • 171
  • 1
  • 5
  • Thanks polimath. The date is stored in a column within each file (same date repeated the entire length of the column). I added my 1st attempt at the code so you could see my "logic". – peakgeek Jul 16 '14 at 22:07