2

I have different csv files with different names. I want to make some calculations and after that I want to save the results into one csv file.

My data of two csv files have this format:

File 1:

 day                 price
 2000-12-01 00:00:00 2 
 2000-12-01 06:00:00 3 
 2000-12-01 12:00:00 NA 
 2000-12-01 18:00:00 3 

File 2:

 day                 price
 2000-12-01 00:00:00 12 
 2000-12-01 06:00:00 NA 
 2000-12-01 12:00:00 14 
 2000-12-01 18:00:00 13 

To read the files I use this:

file1 <- read.csv(path_for_file1, header=TRUE, sep=",")
file2 <- read.csv(path_for_file2, header=TRUE, sep=",")

An example of calculation process:

library(xts)
file1 <- na.locf(file1)
file2 <- na.locf(file2)

And save the results into a csv where the timestamp is the same for the csv files:

merg <- merge(x = file1, y = file2, by = "day", all = TRUE)
write.csv(merge,file='path.csv', row.names=FALSE)

To read multiple files I tried this. Any ideas how can make the process from 2 files to be for n files?

Community
  • 1
  • 1
foc
  • 947
  • 1
  • 9
  • 26
  • Put the output of each `read.csv` into a list. Use the `*apply` family to do the calculations on each dataframe. Then follow instructions [here](http://stackoverflow.com/questions/8091303/merge-multiple-data-frames-in-a-list-simultaneously) for merging. – Thomas Jul 27 '13 at 13:07
  • @Thomas: but these aren't data.frames, so merging is much easier: `do.call(merge, xtsList)`. – Joshua Ulrich Jul 27 '13 at 13:45
  • @JoshuaUlrich Nice. I misread that `file1` and `file2` was being used for the both the initial dataframes and then reused for output of the `na.locf`. – Thomas Jul 27 '13 at 13:52
  • Make all of your filenames into a character vector. Then do something like `lapply(thatvector, function(x) read.csv(x, header=TRUE)` to get the list of dataframes. – Thomas Jul 27 '13 at 14:01
  • @Thomas: whoops, I was wrong. They are data.frames. `na.locf` has a default method that can be used on data.frames. – Joshua Ulrich Jul 27 '13 at 14:08

1 Answers1

3

You say that your data are comma-separated, but you show them as space-separated. I'm going to assume that your data are truly comma-separated.

Rather than reading them into separate objects, it's easier to read them into a list. It's also easier to use read.zoo instead of read.csv because merging time-series is a lot easier with xts/zoo objects.

# get list of all files (change pattern to match your actual filenames)
files <- list.files(pattern="file.*csv")
# loop over each file name and read data into an xts object
xtsList <- lapply(files, function(f) {
  d <- as.xts(read.zoo(f, sep=",", header=TRUE, FUN=as.POSIXct))
  d <- align.time(d, 15*60)
  ep <- endpoints(d, "minutes", 15)
  period.apply(d, ep, mean)
})
# set the list names to the file names
names(xtsList) <- files
# merge all the file data into one object, filling in NA with na.locf
x <- do.call(merge, c(xtsList, fill=na.locf))
# write out merged data
write.zoo(x, "path.csv", sep=",")
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • @foxcandy: your pattern is correct. You could simply add your calculations to the function I use in `xtsList <- lapply(files, ...)`. You might want to wait to run `na.locf` until after the merge, since the merge may product `NA`. – Joshua Ulrich Jul 27 '13 at 14:23
  • @foxcandy: my code works fine for me, and I don't use (or convert to) data.frames anywhere, so my code couldn't throw an error from `as.data.frame`. My guess is that your function returns a data.frame, not an xts object like mine does. – Joshua Ulrich Jul 27 '13 at 14:44