cbind dataframes containing varying numbers of columns

Question

I have several dataframes that contain largely the same variables, but some variables are missing from certain dataframes. I would like to cbind (one particular column of*) the dataframes while creating NA values in the missing fields. By way of example:

d1 <- data.frame(y1 = c("var1", "var2", "var3"),
                 y2 = c(3, 2, 4),
                 y3 = c("not_needed", "not_needed2", "not_needed3"))
d2 <- data.frame(y1 = c("var1", "var3"),
                 y2 = c(2, 1),
                 y3 = c("not_needed", "not_needed2"))
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
                 y2 = c(3, 2, 5),
                 y3 = c("not_needed", "not_needed2", "not_needed3"))

expected_output <- data.frame(y1 = c("var1", "var2", "var3", "var4"),
                              y2.d1 = c(3, 2, 4, NA),
                              y2.d2 = c(2, NA, 1, NA),
                              y2.d3 = c(3, 2, NA, 5))

*The column y3 is not required in the output dataframe.

I've experimented with rbind.fill() from plyr and a few other ideas, but no success so far.

@joran I don't believe that this is a duplicate of the linked question, as I'm not trying to merge the entire dataframes, just one column from each. I appreciate that the answer is probably in there somewhere, but it's not specifically mentioned.

It should be noted that the title of this question does not match the example. All those dataframes have length==3. — IRTFM, Mar 27 '18 at 01:09
I intuitively thought that "length of a dataframe" == "nrow(dataframe)". I've edited title to "...varying numbers of rows". Thanks for the comment. — heds1, Mar 27 '18 at 01:12
Hah. I'm seeing this for a second time in the reopen queue b/c I was about to post a base-R solution and it got closed before I hit 'post': `Reduce( function(x,y) merge(x,y[-3],by="y1", all=TRUE), list(d1[-3],d2,d3) )` — IRTFM, Mar 27 '18 at 04:13

score 3 · Accepted Answer · answered Mar 27 '18 at 01:05

A solution using tidyverse. We can put all data frame in a list and then use functions from purrr to merge it. Notice that I use stringsAsFactors = FALSE when creating the example data frames to prevent factor columns.

library(tidyverse)

d_list <- list(d1, d2, d3)

d_final <- d_list %>%
  map(select, y1, y2) %>%
  reduce(full_join, by = "y1") %>%
  setNames(c("y1", paste0("y2.d", 1:3)))

d_final
#     y1 y2.d1 y2.d2 y2.d3
# 1 var1     3     2     3
# 2 var2     2    NA     2
# 3 var3     4     1    NA
# 4 var4    NA    NA     5

DATA

d1 <- data.frame(y1 = c("var1", "var2", "var3"),
                 y2 = c(3, 2, 4),
                 y3 = c("not_needed", "not_needed2", "not_needed3"),
                 stringsAsFactors = FALSE)
d2 <- data.frame(y1 = c("var1", "var3"),
                 y2 = c(2, 1),
                 y3 = c("not_needed", "not_needed2"),
                 stringsAsFactors = FALSE)
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
                 y2 = c(3, 2, 5),
                 y3 = c("not_needed", "not_needed2", "not_needed3"),
                 stringsAsFactors = FALSE)

cbind dataframes containing varying numbers of columns

1 Answers1