Here are some other options you may have, depending on your data. Other options apart from the obvious path of doing a ton of merges, I mean: in a loop, with Reduce
or with hadley's join_all
/merge_all
/wrap_em_all_up
.
These are all methods that I have used and found to be faster in my own work, but I don't intend to attempt a general benchmarking case. First, some setup:
DFlist = list(df1,df2,df3,df4)
bycols = key(DFlist[[1]])
I'll assume the tables are all keyed by the bycols
.
Stack. If the new cols from each table are somehow related to each other and appear in the same positions in every table, then consider just stacking the data:
DFlong = rbindlist(DFlist, use.names = FALSE, idcol = TRUE)
If for some reason you really want the data in wide format, you can dcast
:
dcast(DFlong,
formula = sprintf("%s ~ .id", paste(bycols, collapse = "+")),
value.var = setdiff(names(DFlong), c(bycols, ".id"))
)
Data.table and R work best with long-format data, though.
Copy cols. If you know that the bycols
take all the same values in all of the tables, then just copy over:
DF = DFlist[[1]][, bycols, with=FALSE]
for (k in seq_along(DFlist)){
newcols = setdiff(names(DFlist[[k]]), bycols)
DF[, (newcols) := DFlist[[k]][, newcols, with=FALSE]]
}
Merge assign. If some levels of bycols
may be missing from certain tables, then make a master table with all combos and do a sequence of merge-assigns:
DF = unique(rbindlist(lapply(DFlist, `[`, j = bycols, with = FALSE)))
for (k in seq_along(DFlist)){
newcols = setdiff(names(DFlist[[k]]), bycols)
DF[DFlist[[k]], (newcols) := mget(newcols)]
}