0

I have several dataframes that contain largely the same variables, but some variables are missing from certain dataframes. I would like to cbind (one particular column of*) the dataframes while creating NA values in the missing fields. By way of example:

d1 <- data.frame(y1 = c("var1", "var2", "var3"),
                 y2 = c(3, 2, 4),
                 y3 = c("not_needed", "not_needed2", "not_needed3"))
d2 <- data.frame(y1 = c("var1", "var3"),
                 y2 = c(2, 1),
                 y3 = c("not_needed", "not_needed2"))
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
                 y2 = c(3, 2, 5),
                 y3 = c("not_needed", "not_needed2", "not_needed3"))

expected_output <- data.frame(y1 = c("var1", "var2", "var3", "var4"),
                              y2.d1 = c(3, 2, 4, NA),
                              y2.d2 = c(2, NA, 1, NA),
                              y2.d3 = c(3, 2, NA, 5))

*The column y3 is not required in the output dataframe.

I've experimented with rbind.fill() from plyr and a few other ideas, but no success so far.

@joran I don't believe that this is a duplicate of the linked question, as I'm not trying to merge the entire dataframes, just one column from each. I appreciate that the answer is probably in there somewhere, but it's not specifically mentioned.

knb
  • 9,138
  • 4
  • 58
  • 85
heds1
  • 3,203
  • 2
  • 17
  • 32
  • 1
    It should be noted that the title of this question does not match the example. All those dataframes have length==3. – IRTFM Mar 27 '18 at 01:09
  • I intuitively thought that "length of a dataframe" == "nrow(dataframe)". I've edited title to "...varying numbers of rows". Thanks for the comment. – heds1 Mar 27 '18 at 01:12
  • That's a common source of erroneous R coding. – IRTFM Mar 27 '18 at 01:14
  • Hah. I'm seeing this for a second time in the reopen queue b/c I was about to post a base-R solution and it got closed before I hit 'post': `Reduce( function(x,y) merge(x,y[-3],by="y1", all=TRUE), list(d1[-3],d2,d3) )` – IRTFM Mar 27 '18 at 04:13

1 Answers1

3

A solution using . We can put all data frame in a list and then use functions from to merge it. Notice that I use stringsAsFactors = FALSE when creating the example data frames to prevent factor columns.

library(tidyverse)

d_list <- list(d1, d2, d3)

d_final <- d_list %>%
  map(select, y1, y2) %>%
  reduce(full_join, by = "y1") %>%
  setNames(c("y1", paste0("y2.d", 1:3)))

d_final
#     y1 y2.d1 y2.d2 y2.d3
# 1 var1     3     2     3
# 2 var2     2    NA     2
# 3 var3     4     1    NA
# 4 var4    NA    NA     5

DATA

d1 <- data.frame(y1 = c("var1", "var2", "var3"),
                 y2 = c(3, 2, 4),
                 y3 = c("not_needed", "not_needed2", "not_needed3"),
                 stringsAsFactors = FALSE)
d2 <- data.frame(y1 = c("var1", "var3"),
                 y2 = c(2, 1),
                 y3 = c("not_needed", "not_needed2"),
                 stringsAsFactors = FALSE)
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
                 y2 = c(3, 2, 5),
                 y3 = c("not_needed", "not_needed2", "not_needed3"),
                 stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84