3

I have 3 files with 3 variables: date, ID, and price. I would like to merge them by date, so if one my current files is:

date      ID  Price
01/01/10   A   1
01/02/10   A   1.02
01/02/10   A   0.99
...
...

I would like to get a merged file that looks like the one below for IDs A,B and C (Pr for Price):

date       Pr.A   Pr.B  Pr.C     
01/01/10   1      NA    NA
01/02/10   1.02   1.2   NA
01/03/10   0.99   1.3   1
01/04/10   NA     1.23  2
01/05/10   NA     NA    3

Notice that for some dates there are not prices so in that case is an NA.

My current approach works but I feel is a bit clumsy.

setwd('~where you put the files')
library(plyr)
listnames = list.files(pattern='.csv')
pp1 = ldply(listnames,read.csv,header=T) #put all the files in a data.frame

names(pp1)=c('date','ID','price')
pp1$date = as.Date(pp1$date,format='%m/%d/%Y')

# Reshape data frame so it gets organized by date
pp1=reshape(pp1,timevar='ID',idvar='date',direction='wide')

Is there any better approach you could think of?

aatrujillob
  • 4,738
  • 3
  • 19
  • 32
  • 1
    goto http://stackoverflow.com/questions/1562124/merge-many-data-frames-from-csv-files – Hemant Metalia Dec 07 '11 at 07:27
  • One note -- the linked file `"a1.csv"` contains several extra comma-separated lines with no data. I removed them by hand, rather than muck up the R code in the answer by doing it there. – Josh O'Brien Dec 07 '11 at 08:01
  • I actually think what you've done with `reshape` here is a pretty good option. – joran Dec 07 '11 at 15:41

2 Answers2

4

Looks like a job for Reduce():

# Read the files in to a single list, removing unwanted second column from each.
dataDir <- "example"
fNames <- dir(dataDir)
dataList <- lapply(file.path(dataDir, fNames),
                   function(X) {read.csv(X, header=TRUE)[-2]})

# Merge them                   
out <- Reduce(function(x,y) merge(x,y, by=1, all=TRUE), dataList)

# Construct column names
names(out)[-1] <- paste("Pr.", toupper(sub("1.csv", "", fNames)), sep="")
out
#       date Pr.A Pr.B Pr.C
# 1 1/1/2010 1.00   NA   NA
# 2 1/2/2010 1.02 1.20   NA
# 3 1/3/2010 0.99 1.30    1
# 4 1/4/2010   NA 1.23    2
# 5 1/5/2010   NA   NA    3

Actually, your approach looks just fine to me, but I can see preferring the simplicity and transparency of syntax in a call to Reduce.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • How does reduce perform in terms of speed? – Paul Hiemstra Dec 07 '11 at 07:40
  • @PaulHiemstra: Not well is my guess (since it will presumably(??) be creating a new data.frame for each merge operation). I don't really know, but I will say that if speed had been at issue in the question, I'd not have suggested `Reduce`. – Josh O'Brien Dec 07 '11 at 08:30
  • interesting choice to use Reduce. Didn't now R had this kind of functional programming methods built in. – LouisChiffre Dec 07 '11 at 12:33
1

I dont have access to the files, I am behind a corporate firewall. I would use the cast method once you have built the data.frame.

    res = cast(pp1,date~ID,value="Price",mean)
LouisChiffre
  • 691
  • 7
  • 15