Merge by column with multiple rows

Question

I have to separate data frame with a common column named col_id.

My problem is the simple merge is not the ideal for my case.

Here is an example structure of df1 col_id

     col_id   stock  ch2
1    id_100 stock 2  yes
2 id_100002 stock 2   no
3 id_100003 stock 2   no

And for the second df

     col_id   num cat1
1    id_100 num 2    0
2    id_100 num 2    1
3    id_100 num 2    0
4 id_100002 num 2    1
5 id_100002 num 2    1
6 id_100002 num 2    1
7 id_100003 num 2    1
8 id_100003 num 2    1

The output i would like to have is to fill all cells of the second df with the same value of df one. Example of the output

     col_id   num cat1   stock  ch2
1    id_100 num 2    0 stock 2  yes
2    id_100 num 2    1 stock 2  yes
3    id_100 num 2    0 stock 2  yes
4 id_100002 num 2    1 stock 2   no
5 id_100002 num 2    1 stock 2   no
6 id_100002 num 2    1 stock 2   no
7 id_100003 num 2    1 stock 2   no
8 id_100003 num 2    1 stock 2   no

Can you provide your table as dataframe, rather than pasting the table ? — user5249203, Aug 03 '16 at 14:12
@user5249203 yes of source but as I can understand you need the df structure to load it directly? — Jake, Aug 03 '16 at 14:20

score 1 · Answer 1 · answered Aug 03 '16 at 14:18

It seems like you want to use the all.x/all.y arguments of the merge function. E.g.,

df1 <- data.frame(
  col_id = c("id_100", "id_10002", "id_10003"),
  stock = c("stock 2"),
  ch2 = c("yes", "no", "no")
)

df2 <- data.frame(
  col_id = c(rep("id_100", 3),
             rep("id_10002", 3),
             rep("id_10003", 2)),
  num = c("num 2"),
  cat1 = c(0, 1, 0, 1, 1, 1, 1, 1)
)

mergedData <- merge(df1, df2, all.y = TRUE)

produces the desired output based on your pasted snippet. You can use any combination of all.(x|y) = (TRUE|FALSE) you'd like to achieve the appropriate join (inner, outer, left, right, whatever). W3 Schools has a good description of different types of joins (they're talking in the context of SQL, but R's merge function is analogous).

score 1 · Accepted Answer · answered Aug 03 '16 at 14:19

Try:

install.packages('dplyr')
library(dplyr)

mytext1 = "col_id,stock, ch2
id_100,stock 2, yes
id_100002,stock 2, no
id_100003,stock 2, no"
mydf1 <- read.table(text=mytext1, header=T, sep=",")

mytext2 = "col_id,num, cat1
id_100,num 2, 0
id_100,num 2, 1
id_100,num 2, 0
id_100002,num 2, 1
id_100002,num 2, 1
id_100002,num 2, 1
id_100003,num 2, 1
id_100003,num 2, 1"

mydf2 <- read.table(text=mytext2, header=T, sep=",")
output_df <- left_join(mydf2,mydf1, by="col_id")

  col_id    num    cat1  stock   ch2
 id_100    num 2    0   stock 2  yes
 id_100    num 2    1   stock 2  yes
 id_100    num 2    0   stock 2  yes
 id_100002 num 2    1   stock 2   no
 id_100002 num 2    1   stock 2   no
 id_100002 num 2    1   stock 2   no
 id_100003 num 2    1   stock 2   no
 id_100003 num 2    1   stock 2   no

score 1 · Answer 3 · answered Aug 03 '16 at 14:37

1

You just need to add two lines of codes as below

    df$stock=rep('stock2',8)
    df$ch2[df$col_id %in% c('id_100,num','id_100002','id_100003']=c('yes','no','no')

This can address your concern.

answered Aug 03 '16 at 14:37

Xiaoguang Ouyang

23
2

Merge by column with multiple rows

3 Answers3