1

I am stuck at a question and would appreciate any ideas for the same

Consider I have a list of 3 data frames:

D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
names <- c(D1, D2, D3)
names(L) <- names
DF<-as.list(names(L))
L <- mapply(cbind, L, "DF"=DF, SIMPLIFY=F)

I would like to alter the variable ID in such a way that if a particular ID, ID=16 for example, does not occur in D1 and D2 but only in D3, I want to append it as a value to the ID column in D1 and D2 and fill the corresponding Y column with 0.

All the three elements of the list should have the same ID columns eventually which is a union of ID values from the three data frames

I tried to do an rbind.fill or something similar but cannot figure it out and I am a little weak in handling lists in general. Any help will be much appreciated.

Thank you so much and hope you have a good weekend!

Manasi Shah
  • 437
  • 3
  • 18
  • Hmmm, are we talking [this](http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list)? – David Arenburg May 01 '16 at 13:18

3 Answers3

1

You can create a single column dataframe IDs and then merge with each dataframe in the list with all = T which add all IDs to each dataframe.

library(dplyr)
IDs <- data.frame(ID = unique(c(D1$ID, D2$ID, D3$ID)))
L <- lapply(L, function(df) merge(df, IDs, by = "ID", all = T) %>% 
                            mutate(Y = ifelse(is.na(Y), 0, Y)))
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thank you for your answer. This solution works well for me as it returns the original list of data frames. Any suggestions on making the NAs in variable Y as 0? I did to substitute NAs using a function like this : set_zero_as_value <- function(x, value=0){ x[x == "NA"] <- value return(x) } ... but without much success. Thanks again! – Manasi Shah May 01 '16 at 07:23
  • You can loop through the list and replace `NA` in the 'y' column with zero. `L <- lapply(L, function(df) df$Y[is.na(df$Y)] <- 0)`. – Psidom May 01 '16 at 13:06
0

Another option.

This will require some post-processing clean-up, but:

set.seed(42)
D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
Reduce(function(a,b) merge(a, b, by="ID", all.x=TRUE, all.y=TRUE), L)
#    ID  Y.x  Y.y    Y
# 1   1  yes <NA> <NA>
# 2   2  yes <NA> <NA>
# 3   3   no <NA> <NA>
# 4   4  yes <NA> <NA>
# 5   5   no  yes <NA>
# 6   6  yes   no <NA>
# 7   7   no   no  yes
# 8   8   no   no  yes
# 9   9   no  yes  yes
# 10 10  yes   no  yes
# 11 11 <NA>   no   no
# 12 12 <NA>  yes  yes
# 13 13 <NA>  yes   no
# 14 14 <NA>   no   no
# 15 15 <NA> <NA>   no
# 16 16 <NA> <NA>  yes
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

Here is another option using full_join from dplyr

library(dplyr)
Reduce(function(...) full_join(..., by="ID"), L)
#   ID  Y.x  Y.y    Y
#1  10  yes   no  yes
#2   9   no  yes  yes
#3   3   no <NA> <NA>
#4   6  yes   no <NA>
#5   4  yes <NA> <NA>
#6   8   no   no  yes
#7   5   no  yes <NA>
#8   1  yes <NA> <NA>
#9   2  yes <NA> <NA>
#10  7   no   no  yes
#11 14 <NA>   no   no
#12 12 <NA>  yes  yes
#13 11 <NA>   no   no
#14 13 <NA>  yes   no
#15 16 <NA> <NA>  yes
#16 15 <NA> <NA>   no
akrun
  • 874,273
  • 37
  • 540
  • 662