0

Consider I have two data.frames:

A<-data.frame(a=c("b","a", "a", "e", "e","a"),Za=c(11,22,33,44,55,66))
B<-data.frame(b=c("a","a", "b", "e", "f","f"),Zb=c(11,22,33,44,55,66))

Now I want to match them based on the columns a and b, but sustain every possible combination. So in the end I want to have :

Anew<-data.frame(a=c("a","a","a","a","a","a","b","e","e","f","f"),Za=c(11,11,11,22,22,22,33,44,44,55,66))

Bnew<-data.frame(b=c("a","a","a","a","a","a","b","e","e",NA,NA),Zb=c(22,33,66,22,33,66,11,44,55,NA,NA))


Anew
   a Za
1  a 11
2  a 11
3  a 11
4  a 22
5  a 22
6  a 22
7  b 33
8  e 44
9  e 44
10 f 55
11 f 66

Bnew
      b Zb
1     a 22
2     a 33
3     a 66
4     a 22
5     a 33
6     a 66
7     b 11
8     e 44
9     e 55
10 <NA> NA
11 <NA> NA

I dont want to use ncomb if possible as my vector is really really huge and this would kill my memory. A fast running solution would be perfect!

Many thanks for every help!

JmO
  • 572
  • 1
  • 4
  • 20
  • I'm not sure that it's an exact duplicate because the OP also wants to add information about a and b. See the comments of the answer. – Orhan Yazar Mar 16 '18 at 15:06

1 Answers1

1

If you are working with large dataset don't use data.frame but use data.table instead. Here is a solution:

A<-data.table(a=c("b","a", "a", "e", "e","a"),Za=c(11,22,33,44,55,66))
B<-data.table(b=c("a","a", "b", "e", "f","f"),Zb=c(11,22,33,44,55,66))

df <- merge(A, B, by.x="a",by.y="b", all = TRUE)

df[,Match := ifelse(!is.na(Za),1,0)]

    a Za Zb Match
 1: a 22 11     1
 2: a 22 22     1
 3: a 33 11     1
 4: a 33 22     1
 5: a 66 11     1
 6: a 66 22     1
 7: b 11 33     1
 8: e 44 44     1
 9: e 55 44     1
10: f NA 55     0
11: f NA 66     0
Orhan Yazar
  • 909
  • 7
  • 19
  • thanks for the fast answer. Is it possible to have two seperate data.tables (data.frames) as result like Anew and Bnew in my question, Or even better add a column of indices at which position Anew has the match in Bnew? – JmO Mar 16 '18 at 14:52
  • I added a Match column. when it's 1, there is a match, when it's 0 there is no match. – Orhan Yazar Mar 16 '18 at 16:03