0

I found many similar questions but noone of them fits to my problem. I have two large dataframes with few common columns. I'm trying to assign by reference the values from the first df to the second.

I tried more combinations but noone works coorrectly, for example:

library(data.table)

#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
                              B=rep(rnorm(60,mean=5)),
                              C=rep(rnorm(60,mean=10)))
#loop 
for (i in c("B","C")){
setDT(DB)[DB2,  i := i, on = .(A == A)]}

Thus, I'd like to loop the following code:

setDT(DB)[DB2, B := B, on = .(A == A)]
setDT(DB)[DB2, C := C, on = .(A == A)]
> DB
    A        B         C
 1: C 5.593191 10.697466
 2: C 5.593191 10.697466
 3: E 4.482933  8.726371
 4: D 5.454512 11.054162
 5: A 4.306571 11.427917
 6: E 4.482933  8.726371
 7: D 5.454512 11.054162
 8: E 4.482933  8.726371
 9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933  8.726371
21: D 5.454512 11.054162
22: E 4.482933  8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933  8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
    A        B         C

any help would be very grateful

Matt_4
  • 147
  • 1
  • 12

1 Answers1

2

Try out:

library(data.table)
#create dfs
set.seed(32)
DB <- data.frame(A=sample(c("A","B","C","D","E"),30,replace = T))
DB2 <- data.frame(A=sample(c("A","B","C","D","E","F","G"),60,replace = T),
                  B=rep(rnorm(60,mean=5)),
                  C=rep(rnorm(60,mean=10)))
#try
setDT(DB)[DB2, c("B", "C") := list(B, C), on = .(A == A)]
DB #output
    A        B         C
 1: C 5.593191 10.697466
 2: C 5.593191 10.697466
 3: E 4.482933  8.726371
 4: D 5.454512 11.054162
 5: A 4.306571 11.427917
 6: E 4.482933  8.726371
 7: D 5.454512 11.054162
 8: E 4.482933  8.726371
 9: D 5.454512 11.054162
10: B 4.741633 10.846106
11: D 5.454512 11.054162
12: B 4.741633 10.846106
13: D 5.454512 11.054162
14: D 5.454512 11.054162
15: B 4.741633 10.846106
16: D 5.454512 11.054162
17: D 5.454512 11.054162
18: C 5.593191 10.697466
19: D 5.454512 11.054162
20: E 4.482933  8.726371
21: D 5.454512 11.054162
22: E 4.482933  8.726371
23: C 5.593191 10.697466
24: A 4.306571 11.427917
25: C 5.593191 10.697466
26: E 4.482933  8.726371
27: C 5.593191 10.697466
28: C 5.593191 10.697466
29: C 5.593191 10.697466
30: D 5.454512 11.054162
    A        B         C

Update

Franck's suggestion should also work fine and is more efficient for a large number or columns (note that mget returns a named list)

cols <- colnames(DB2)[!(colnames(DB2) %in% colnames(DB))]
setDT(DB)[DB2, (cols) := mget(paste0("i.", cols)), on = .(A = A)]
nghauran
  • 6,648
  • 2
  • 20
  • 29
  • 1
    thank you, however using it I should write 160 column names inside the list. Is there a more practical way to do it? – Matt_4 Oct 16 '18 at 16:11
  • 1
    In that case Franck's suggestion would be more relevant. I edited my answer – nghauran Oct 16 '18 at 16:43