-1

I have multiple files in a list and I want to merge them based on Year column, so that my new file looks like Merged_file. I could use merge(file1, file2, by="Year") if I had 2 files, but I don't know how to do that for multiple files in a list. I also tried this newlist <- lapply(files, function(t)do.call(rbind.fill, t)) but its not what I want.

file1             file2                Merged_file

Year  Value1      Year  Value2         Year Value1 Value2
2001   1          2000   0.5           2001  1       0.3
2001   2          2000   0.6           2001  2       0.3 
2002   2          2001   0.3           2002  2       0.5
2002   3          2001   0.3           2002  3       0.6
2003   3          2002   0.5           2003  3       0.6       
2003   4          2002   0.6           2003  4       0.6
                  2003   0.6
                  2003   0.6
Geo-sp
  • 1,704
  • 3
  • 19
  • 42
  • I used this `merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)` but it seems that doesn't work for data frames with different number of rows – Geo-sp Apr 12 '13 at 16:56
  • On a closer look, it seems that what you are calling a "merge" might not be very well defined. For instance, you only end up with two rows for 2002 in the result. Will there always only be two records per year in each file? – joran Apr 12 '13 at 17:31
  • No these are multiple rows and column. I have multiple csv files with the same header on the other hand I have another set of files that I want to merge with the first sets. I made a list of list for all the files that I want to merge. I need something like this `list <- lapply(tables, function(t)do.call((smartbind), t))` but I need to merge based on `Year` column. – Geo-sp Apr 12 '13 at 17:51
  • You haven't really clarified anything. The details we need are on the _specific_ mechanics of what you are calling a "merge". Because your example merged file doesn't correspond to any sensible definition of merging. See Brian's answer below for what I _think_ you're trying to do. – joran Apr 12 '13 at 18:12

2 Answers2

1

You say there are not the same number of rows in each data set; are there the same number of rows for any single year, though? I get the sense that you want to take subsets of the files with the same year and combine (cbind) them, but I'm not sure. See if this does what you want/mean:

file1 <- read.table(text=
"Year  Value1      
2001   1          
2001   2          
2002   2          
2002   3          
2003   3                
2003   4", header=TRUE)

file2 <- read.table(text=
"Year  Value2         
2000   0.5           
2000   0.6           
2001   0.3           
2001   0.3           
2002   0.5           
2002   0.6           
2003   0.6           
2003   0.6", header=TRUE)

bind.by.var <- function(file1, file2, var = intersect(names(file1), names(file2))) {
    do.call(rbind, lapply(intersect(file1[[var]], file2[[var]]), function(y) {
        cbind(file1[file1[[var]]==y,],
              file2[file2[[var]]==y,setdiff(names(file2),var),drop=FALSE])
    }))
}

The function bind.by.var figures out which column the two files have in common (Year), then what years appear in both files. Then, year by year, combines (binds) the years together. I don't know if this is in general what you want, but it does match your Merged_file example

> bind.by.var(file1, file2)
  Year Value1 Value2
1 2001      1    0.3
2 2001      2    0.3
3 2002      2    0.5
4 2002      3    0.6
5 2003      3    0.6
6 2003      4    0.6

Given this and a list of files, you can use the Reduce technique on it.

Reduce(bind.by.var, list(file1, file2))

where you replace the explicit list with your list of data.frame which were read in from files.

The assumption here is that there are the same number of rows for any one year in each file. If that is not the case, you need to explain how you want data from a year to be combined/merged.

Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
  • Thanks Brian! this works in this case, but I have list of lists for all the files that I want to rbind based on the Year column. And I have the same number of rows for any single year as you said. – Geo-sp Apr 12 '13 at 20:48
0

Consider using the "plyr" package, with the command "ldply".

### Create a vector of file names to be read in
files <- list.files()

### Iterate over each file, reading in data each time
data <- lapply(files, read.csv)

### Use "ldply" to merge files from a list into a data.frame
data <- ldply(data)
Andreas
  • 1,923
  • 19
  • 24
  • Yes, I have created a list of lists containing all 2 tables that I want to merge. Each of these 2 tables have different number of rows and I need to match them based on one column. – Geo-sp Apr 12 '13 at 17:02
  • The "ldply" command will take a list of data.frames and merge them into one data.frame. The merge is defaulted to merge on columns with the same name between data sets. – Andreas Apr 12 '13 at 17:06