How to retrieve multiple matching elements?

Question

I have to two data frames df_a, df_b.

set.seed(143)
df_a <- data.frame(colA = sample(1:10, 10, replace=T), colB = sample(LETTERS[1:20],10))
df_a
#   colA colB
#1    10    I
#2     1    D
#3     8    R
#4     5    F
#5     1    N
#6    10    P
#7     7    E
#8     6    S
#9     6    T
#10    4    C

df_b <- data.frame(colA = sample(1:10, 10, replace=T))
df_b
#   colA
#1     9
#2     3
#3     9
#4     9
#5     3
#6    10
#7    10
#8     7
#9     4
#10    7

I have to update the colB with values from colB in data frame df_a based on the matching of the colA in both data frames.

df_a[match(df_b$colA, df_a$colA),'colB']
# [1] <NA> <NA> <NA> <NA> <NA> I    I    E    C    E   
#Levels: C D E F I N P R S T

The output doesn't give more than one matched element(not a surprise!). For eg., 10 has two values I,P but only gives I.

Expected Output(something like this, maybe?):

df_a[match(df_b$colA, df_a$colA),'colB']
# [1] <NA> <NA> <NA> <NA> <NA> I,P    I,P   E    C    E   
#Levels: C D E F I N P R S T

As the match() function returns only the first matched value, is there any other alternative like which() or %in% for achieving the task ?

I would use `merge(df_b, df_a, by = "colA", all.x = TRUE)`, but what to do with `colA == 10`? — m-dz, Sep 15 '16 at 12:24
@m-dz yes and no. I was looking for solution which can be embedded as a single value with a delimiter. — Prradep, Sep 15 '16 at 12:34

score 2 · Accepted Answer · answered Sep 15 '16 at 12:34

You need to aggregate to paste same matches together, and then merge, i.e.

merge(df_b, aggregate(colB ~ colA, df_a, paste, collapse = ','), by = 'colA', all.x = TRUE)
#   colA colB
#1     3 <NA>
#2     3 <NA>
#3     4    C
#4     7    E
#5     7    E
#6     9 <NA>
#7     9 <NA>
#8     9 <NA>
#9    10  I,P
#10   10  I,P

m-dz · Answer 2 · 2016-09-15T12:51:12.407

If you want to have df_b in the original order I would use data.table as follows:

library(data.table)
setDT(df_b)
setDT(df_a)

df_a[, sapply(.SD, paste, collapse = ","), by = "colA"][df_b, , on = "colA"]

First, df_a[, sapply(.SD, paste, collapse = ",") , by = "colA"] is doing exactly same aggregation on df_a as @Sotos proposed, and then [df_b, , on = "colA"] is merging the result of it with df_b.

The result is:

    colA  V1
 1:    9  NA
 2:    3  NA
 3:    9  NA
 4:    9  NA
 5:    3  NA
 6:   10 I,P
 7:   10 I,P
 8:    7   E
 9:    4   C
10:    7   E

How to retrieve multiple matching elements?

2 Answers2