Say I have 5 dataframes with identical columns but different row lengths. I want to make 1 dataframe that takes a specific column from each of the 5 dataframes, and fills in with NA's (or whatever) where there isn't a length match. I've seen questions on here that show how to do this with one-off vectors, but I'm looking for a way to do it with bigger sets of data.
Ex: 2 dataframes of equal length:
long <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
long2 <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
I can create a list that combines them, then create an empty dataframe and populate it with a common variable from the dataframes in the list:
list1 <- list(long, long2)
df1 <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
df1[,1:2] <- sapply(list, '[[', 'accepted')
And it works.
But when I have more dataframes of unequal length, this approach fails:
long <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
long2 <- data.frame(accepted = rnorm(350, 2000), cost = rnorm(350,5000))
medlong <- data.frame(accepted = rnorm(300, 2000), cost = rnorm(300,5000))
medshort <- data.frame(accepted = rnorm(150, 2000), cost = rnorm(150,5000))
short <- data.frame(accepted = rnorm(50, 2000), cost = rnorm(50,5000))
Now making the list and combined dataframe:
list2 <- list(long, long2, medlong, medshort, short)
df2 <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
df1[,1:5] <- sapply(list, '[[', 'accepted')
I get the error about size mismatch:
Error in
[<-.data.frame
(*tmp*
, , 1:5, value = c(1998.77096640377, : replacement has 700 items, need 1750
The only solution I've found to populating this dataframe with columns of unequal length from other dataframes is something along the lines of:
combined.df <- as.data.frame(matrix(0, ncol = 5, nrow = 350))
combined.df[,1] <- long[,2]
combined.df[,2] <- c(medlong[,2], rep(NA, nrow(long) - nrow(medlong))
But there's got to be a more elegant and faster way to do it... I know I'm missing something huge conceptually here