0

I would like to concatenate multiple vectors into a data frame, using the names of each vector to guide the concatenation.

for instance if I have vectors x1, x2, and x3:

sample(1:50,20)->x1; sample(1:50,20)->x2; sample(1:50,20)->x3

and each vector has names such:

nam <- paste("A",1:50, sep=""); names(x1)<-as.character(sample(nam,20)); names(x2)<-as.character(sample(nam,20)); names(x3)<-as.character(sample(nam,20))

I would like to generate a data frame in which the first column contains all names used in at least one vector and the rest of the columns containing the values associated to each vector with "na" when there is no value for a particular name. Something like this:

A1 3 NA NA
A2 NA 4 5
A3 NA 3 NA
A4 NA 22 NA
....

That would mean that the name A1 is associated with a value (which is 3) only in x1, but not in x2 or x3. A2 is associated with value only in vector x2 and x3 but not in x1. Etc.

Any idea of how to do this?

Thank you very much,

2 Answers2

0

I came out with something like that:

sort(unique(names(c(x1,x2,x3))))->nam2

cbind(nam2,x1[match(nam2,names(x1))],x2[match(nam2,names(x2))],x3[match(nam2,names(x3))])

I would like to do this for more than 500 vectors in a list, any idea of how to put this into a lapply or something like that?

Thanks again

0

Consider the chain merge after creating a list of dataframes:

set.seed(61718)  # PLACED AT VERY TOP FOR REPRODUCIBILITY
...

# USES ANY OBJECT WITH "x" IN NAME (HERE BEING c("x1", "x2", "x3"))
df_list <- lapply(ls(pattern="x"), function(d)
  # CONVERTS VECTOR INTO DATAFRAME AND RENAMES COLUMNS
  setNames(transform(data.frame(get(d)), letter=names(get(d))), c(d, "letter"))
)

# CHAIN MERGE
master_df <- Reduce(function(x,y) merge(x, y, by="letter", all=TRUE), df_list)

head(master_df, 10)    
#    letter x1 x2 x3
# 1     A11 50 12  5
# 2     A12 34  8  1
# 3     A13  3 31 NA
# 4     A14 42  7 NA
# 5     A17 27 44 41
# 6      A2 14 NA 46
# 7     A24  2 NA NA
# 8     A26 29  1 34
# 9     A30 23  4 38
# 10    A31  1 25 12

Alternatively, if Reduce (being iterative) runs too slow, consider building same dataframe list but have each merge with an all_name_df, then cbind all results together:

all_name_df <- data.frame(letter=nam)

df_list <- lapply(c("x1", "x2", "x3"), function(d) {
  df <- setNames(transform(data.frame(get(d)), letter=names(get(d))), c(d, "letter"))
  merge(all_name_df, df, all.x=TRUE)[-1]  # -1 REMOVES letter COLUMN
})

master_df <- cbind(all_name_df, do.call(cbind, df_list))

head(master_df, 10)
#    letter x1 x2 x3
# 1      A1 NA NA NA
# 2      A2 NA 32 19
# 3      A3 50 12  5
# 4      A4 34  8  1
# 5      A5  3 31 NA
# 6      A6 42  7 NA
# 7      A7 NA NA NA
# 8      A8 NA 40 NA
# 9      A9 27 44 41
# 10    A10 NA NA NA
Parfait
  • 104,375
  • 17
  • 94
  • 125