In a list of dataframes, add a variable value that occurs in a particular element of a list to that same variable in all elements of the list

Question

I am stuck at a question and would appreciate any ideas for the same

Consider I have a list of 3 data frames:

D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
names <- c(D1, D2, D3)
names(L) <- names
DF<-as.list(names(L))
L <- mapply(cbind, L, "DF"=DF, SIMPLIFY=F)

I would like to alter the variable ID in such a way that if a particular ID, ID=16 for example, does not occur in D1 and D2 but only in D3, I want to append it as a value to the ID column in D1 and D2 and fill the corresponding Y column with 0.

All the three elements of the list should have the same ID columns eventually which is a union of ID values from the three data frames

I tried to do an rbind.fill or something similar but cannot figure it out and I am a little weak in handling lists in general. Any help will be much appreciated.

Thank you so much and hope you have a good weekend!

Hmmm, are we talking [this](http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list)? — David Arenburg, May 01 '16 at 13:18

Psidom · Accepted Answer · 2016-05-01T13:12:44.587

1

You can create a single column dataframe IDs and then merge with each dataframe in the list with all = T which add all IDs to each dataframe.

library(dplyr)
IDs <- data.frame(ID = unique(c(D1$ID, D2$ID, D3$ID)))
L <- lapply(L, function(df) merge(df, IDs, by = "ID", all = T) %>% 
                            mutate(Y = ifelse(is.na(Y), 0, Y)))

edited May 01 '16 at 13:12

answered May 01 '16 at 02:51

Psidom

209,562
33
339
356

Thank you for your answer. This solution works well for me as it returns the original list of data frames. Any suggestions on making the NAs in variable Y as 0? I did to substitute NAs using a function like this : set_zero_as_value <- function(x, value=0){ x[x == "NA"] <- value return(x) } ... but without much success. Thanks again! – Manasi Shah May 01 '16 at 07:23
You can loop through the list and replace `NA` in the 'y' column with zero. `L <- lapply(L, function(df) df$Y[is.na(df$Y)] <- 0)`. – Psidom May 01 '16 at 13:06

score 0 · Answer 2 · answered May 01 '16 at 04:37

Another option.

This will require some post-processing clean-up, but:

set.seed(42)
D1 <- data.frame(ID = sample(1:10), Y = sample(c("yes", "no"), 10, replace =TRUE))
D2 <- data.frame(ID = sample(5:14), Y = sample(c("yes", "no"), 10, replace = TRUE))
D3 <- data.frame(ID = sample(7:16), Y = sample(c("yes", "no"), 10, replace = TRUE))
L <- list(D1, D2, D3)
Reduce(function(a,b) merge(a, b, by="ID", all.x=TRUE, all.y=TRUE), L)
#    ID  Y.x  Y.y    Y
# 1   1  yes <NA> <NA>
# 2   2  yes <NA> <NA>
# 3   3   no <NA> <NA>
# 4   4  yes <NA> <NA>
# 5   5   no  yes <NA>
# 6   6  yes   no <NA>
# 7   7   no   no  yes
# 8   8   no   no  yes
# 9   9   no  yes  yes
# 10 10  yes   no  yes
# 11 11 <NA>   no   no
# 12 12 <NA>  yes  yes
# 13 13 <NA>  yes   no
# 14 14 <NA>   no   no
# 15 15 <NA> <NA>   no
# 16 16 <NA> <NA>  yes

score 0 · Answer 3 · answered May 01 '16 at 04:41

Here is another option using full_join from dplyr

library(dplyr)
Reduce(function(...) full_join(..., by="ID"), L)
#   ID  Y.x  Y.y    Y
#1  10  yes   no  yes
#2   9   no  yes  yes
#3   3   no <NA> <NA>
#4   6  yes   no <NA>
#5   4  yes <NA> <NA>
#6   8   no   no  yes
#7   5   no  yes <NA>
#8   1  yes <NA> <NA>
#9   2  yes <NA> <NA>
#10  7   no   no  yes
#11 14 <NA>   no   no
#12 12 <NA>  yes  yes
#13 11 <NA>   no   no
#14 13 <NA>  yes   no
#15 16 <NA> <NA>  yes
#16 15 <NA> <NA>   no

In a list of dataframes, add a variable value that occurs in a particular element of a list to that same variable in all elements of the list

3 Answers3