I have a large list of data.frames that need to be bound pairwise by columns and then by rows prior to being fed into a predictive model. As no values will be modified, I would like to have the final data.frame pointing to the original data.frames in my list.
For example:
library(pryr)
#individual dataframes
df1 <- data.frame(a=1:1e6+0, b=1:1e6+1)
df2 <- data.frame(a=1:1e6+2, b=1:1e6+3)
df3 <- data.frame(a=1:1e6+4, b=1:1e6+5)
#each occupy 16MB
object_size(df1) # 16 MB
object_size(df2) # 16 MB
object_size(df3) # 16 MB
object_size(df1, df2, df3) # 48 MB
#will be in a named list
dfs <- list(df1=df1, df2=df2, df3=df3)
#putting into list doesn't create a copy
object_size(df1, df2, df3, dfs) #48MB
Final data.frame will have this orientation (every unique pair of data.frames bound by columns, then pairs bound by rows):
df1, df2
df1, df3
df2, df3
I am currently implementing this as such:
#generate unique df combinations
df_names <- names(dfs)
pairs <- combn(df_names, 2, simplify=FALSE)
#bind dfs by columns
combo_dfs <- lapply(pairs, function(x) cbind(dfs[[x[1]]], dfs[[x[2]]]))
#no copies created yet
object_size(dfs, combo_dfs) # 48MB
#bind dfs by rows
combo_df <- do.call(rbind, combo_dfs)
#now data gets copied
object_size(combo_df) # 96 MB
object_size(dfs, combo_df) # 144 MB
How can I avoid copying my data but still achieve the same end result?