Using reduce to Merge several csv files with more than one column as condition and different length

Question

I am trying to create one data frame from 87 .cvs files using 3 columns in common in all the excel files. The columns in common are Date CowID Time, but the files contain several columns that are not common but need to be merged. Each file has also a different length, so I guess the final data.frame will have lots of NAs. I don't have problems reading the files. I found a code to do it using Merge several data.frames into one data.frame with a loop.

filenames <- list.files(path= "...\\data\\dat_merge", full.names=TRUE)
library(plyr)
import.list <- llply(filenames, read.csv)
data <- Reduce(function(x, y) merge(x, y, all=TRUE, 
                                    by.x=c("Date", "CowID", "Time"),             
                                    by.y=c("Date", "CowID", "Time")),
                                    import.list, accumulate=F)`

However when I tried to merge the files using reduce I got the error:

Error in fix.by(by.y, y): 'by' must specify a uniquely valid column

I don't know if the problem is because one the column in common is date and it is a character like 11/24/2018.

I tried to use x <- multimerge(my_data, all = TRUE, by=c("Date", "CowID", "Time")) but it does not work either

`merge(...,by = c("Date", "CowID", "Time"))` – Sotos Dec 03 '18 at 15:36 — Sotos, Dec 03 '18 at 15:36

score 0 · Answer 1 · answered Dec 03 '18 at 16:16

Here's how I would solve it: Use lapply to read in all the files, and extract only the wanted columns. Then use do.call("rbind",) to merge the dataframes.

data=do.call("rbind",lapply(filenames,function(f){
        temp=read.csv(f)
        temp=temp[,c("Date", "CowID", "Time")]
        return(temp)
    }))

Using reduce to Merge several csv files with more than one column as condition and different length

1 Answers1