I am currently using following code to merge >130 data frames and the code takes too many hours to run (I actually never got to the completion on such a big dataset, only on subsets). Each table has two columns: unit (string) and counts (integer). I am merging by units.
tables <- lapply(files, function(x) read.table(x), col.names=c("unit", x))))
MyMerge <- function(x, y){
df <- merge(x, y, by="unit", all.x= TRUE, all.y= TRUE)
return(df)
}
data <- Reduce(MyMerge, tables)
Is there any way to speed this up easily? Each table/dataframe separately has around 500,000 rows and many of those are unique to that table. Therefore, by merging multiple tables one quickly gets number of the rows of the merged dataframe to many millions..
At the end, I will drop rows with too low summary counts from my big merged table, but I don't want to to that during merging as the order of my files would matter then..