Merge files by matching pattern

Question

Suppose I have a full pattern file such like:

pattern<-data.frame(x1=c(0,0,0,1,0,1,1,1),
                 x2=c(0,0,1,0,1,1,0,1),
                 x3=c(0,1,0,0,1,0,1,1),
                 y=c(11,14, 12, 14, 16, 18, 19, 20))
pattern
  x1 x2 x3  y
1  0  0  0 11
2  0  0  1 14
3  0  1  0 12
4  1  0  0 14
5  0  1  1 16
6  1  1  0 18
7  1  0  1 19
8  1  1  1 20

And a data file:

set.seed(123)
df<-data.frame(a=rbinom(100, 1, 0.5), 
               b=rbinom(100, 1, 0.2), 
               c=rbinom(100, 1, 0.6))
head(df)
  a b c
1 0 0 1
2 1 0 0
3 0 0 0
4 1 1 1
5 1 0 1
6 0 1 0

What I want is to search each row of df from the pattern and fill in the y's value such like:

I'm wondering Whether there's a easy want to do that in R.

Your title says "Merge". Did you try the `merge` function? Untested, but something like: `merge(df, pattern, by.x = names(df)[1:3], by.y = names(pattern)[1:3])` — A5C1D2H2I1M1N2O1R2T1, Nov 06 '14 at 15:31
possible duplicate of [How to join data frames in R (inner, outer, left, right)?](http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right) — Henrik, Nov 06 '14 at 15:58

akrun · Answer 1 · 2014-11-07T04:33:59.697

2

One way is to use match after pasting together the columns to match with interaction and then get the y values based on the numeric index from match.

 indx1 <- as.character(interaction(df, sep=''))
 indx2 <- as.character(interaction(pattern[,-4], sep=''))

 df$y <- pattern$y[match(indx1, indx2)]


  head(df)
  #  a b c  y
  #1 0 0 1 14
  #2 1 0 0 14
  #3 0 0 0 11
  #4 1 1 1 20
  #5 1 0 1 19
  #6 0 1 0 12

Or you could use left_join from dplyr

 library(dplyr)
 res <- left_join(df, pattern, by=c('a'='x1', 'b'='x2', 'c'='x3'))
 #the `by` part is contributed by @Henrik

 head(res)
 #  a b c  y
 #1 0 0 1 14
 #2 1 0 0 14
 #3 0 0 0 11
 #4 1 1 1 20
 #5 1 0 1 19
 #6 0 1 0 12

or for a faster way, use data.table

 library(data.table)
 res1 <- setkey(setDT(pattern))[df] #suggested by @Arun
 head(res1)
 #   x1 x2 x3  y
 #1:  0  0  1 14
 #2:  1  0  0 14
 #3:  0  0  0 11
 #4:  1  1  1 20
 #5:  1  0  1 19
 #6:  0  1  0 12

edited Nov 07 '14 at 04:33

answered Nov 06 '14 at 15:29

akrun

874,273
37
540
662

1

No need to rename columns. Please see `?left_join`: "To join by different variables on x and y use a named vector. For example, `by = c("a" = "b")` will match x.a to y.b. – Henrik Nov 06 '14 at 16:00
@Henrik Thanks for the tip. Never used it that way. – akrun Nov 06 '14 at 16:01
1

Alternatively: `setkey(setDT(pattern))[df]` – Arun Nov 07 '14 at 03:17
@Arun Thanks, very useful. I was thinking `df` had to be converted to `data.table`. – akrun Nov 07 '14 at 04:36
1

No worries. `debugonce()` is a very powerful function to learn the source code without making assumptions. – Arun Nov 07 '14 at 09:23

G. Grothendieck · Accepted Answer · 2014-11-06T16:26:25.143

1) merge Use merge in base R:

merge(df, pattern, by = 1:3, all.x = TRUE, all.y = FALSE)

all.x=TRUE says to keep all of the rows of df even if they have no match and all.y=FALSE says to not keep rows of pattern which match no row of df. This combination of all.x and all.y is a left join.

2) sqldf Using SQL:

library(sqldf)

sqldf("select df.*, pattern.y from df 
  left join pattern on df.a = pattern.x1 and df.b = pattern.x2 and df.c = pattern.x3")

Left join keeps all rows of the left hand data frame (df) but not of the right hand (pattern).

Merge files by matching pattern

2 Answers2