-3

df1 and df2 have columns a,b. I want to subset data from df1 such that each entry in df1$a along with df1$b is in df2$a along with df2$b.

df1
a   b  c
1   m  df1
2   f  df1
3   f  df1
4   m  df1
5   f  df1
6   m  df1

df2
a   b  c
1   m  df2
3   f  df2
4   f  df2
5   m  df2
6   f  df2
7   m  df2

desired output

df
a   b  c
1   m  df1
3   f  df1

i am using :

df <- subset(df1,(df1$a%in%df2$a & df1$b%in%df2$b))

but this is giving results similar to

df <-subset(df1,df1$a%in%df2$a)
vk087
  • 106
  • 12
  • Probably `df1[(!df1$a %in% df2$a) & (!df1$b %in% df2$b), ]` – David Arenburg Feb 05 '15 at 13:04
  • I have changed the question. Please read it again, and this method is also giving the same result as one condition. – vk087 Feb 05 '15 at 13:10
  • So maybe `df1[(df1$a %in% df2$a) & (df1$b %in% df2$b), ]` then? – David Arenburg Feb 05 '15 at 13:13
  • 3
    Please add a reproducible example, comtaining the outpout you get ant=d the output you expect. Plese see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for how to make an good reproducible example. – Rainer Feb 05 '15 at 13:13
  • No david, it is yielding result similar to df <-subset(df1,df1$a%in%df2$a). However i have changed my Question once again, as i am also confused on it. Now it is giving clearer picture of the question. – vk087 Feb 05 '15 at 13:19
  • 1
    You can't just edit the question each time you are getting a working solution. – David Arenburg Feb 05 '15 at 13:50
  • @DavidArenburg I am sorry, I am new here. i am still learning how to quote a question. Anyways, lesson learnt. Will try to avoid these silly mistakes. – vk087 Feb 05 '15 at 13:51

2 Answers2

4

You can use package dplyr:

library(dplyr)
intersect(df1,df2)
#  a b
#1 1 m
#2 3 f

Edit for the new data.frames with c column: you can use function semi_join (also from dplyr):

semi_join(df1,df2,by=c("a","b"))
#  a b   c
#1 1 m df1
#2 3 f df1

Other option, in base R:
you can paste your a and b variables to subset your data.frame:

df1[paste(df1$a,df1$b) %in% paste(df2$a,df2$b), ]
#  a b
#1 1 m
#3 3 f

and with the new data.frames:

   #   a b   c
   # 1 1 m df1
   # 3 3 f df1
Cath
  • 23,906
  • 5
  • 52
  • 86
  • I am not targeting the use of paste as it is increasing the run time. Anyother method ?? – vk087 Feb 05 '15 at 13:38
  • @VaibhavKaushal yes, David's one ;-) or with package `dplyr`, see my edit – Cath Feb 05 '15 at 13:39
  • 1
    I was turning around intersect base R, but dplyr overload is nice :) – Colonel Beauvel Feb 05 '15 at 13:44
  • @ColonelBeauvel, yes I find `dplyr` `setdiff` and `intersect` functions much better (intuitive...) than `base` R ones for data.frames – Cath Feb 05 '15 at 13:45
  • Oooops, looks like i am not good at framing question. One final edit to the question, @CathG can you help ? – vk087 Feb 05 '15 at 13:47
  • @VaibhavKaushal, that's why it is better to post a reproducible example from the start!... so my `base` R sol is back in the game ;-) – Cath Feb 05 '15 at 13:48
  • @CathG Yep, I can use it. Thanks for the input dude, but I am looking for anything other than paste. It is increasing the run time effectively in my original data's case :( – vk087 Feb 05 '15 at 13:50
  • 1
    @DavidArenburg, thanks, it is indeed not that appropriate ;-) – Cath Feb 05 '15 at 14:01
  • @VaibhavKaushal, see my edit, you can go with `semi_join` function from `dplyr` – Cath Feb 05 '15 at 14:09
3

Or you could do

Res <- rbind(df1, df2) 
Res[duplicated(Res), ]
#   a b
# 7 1 m
# 8 3 f

Edit1: Per the edit, here's a similar data.table solution

library(data.table)
Res <- rbind(df1, df2)
setDT(Res)[duplicated(Res, by = c("a", "b"), fromLast = TRUE)]
#    a b   c
# 1: 1 m df1
# 2: 3 f df1

Edit2: I see that @CathG opened a join battlefront, so here's how we do it with data.table

setkey(setDT(df1), a, b) ; setkey(setDT(df2), a, b)
df1[df2, nomatch = 0]
#    a b   c i.c
# 1: 1 m df1 df2
# 2: 3 f df1 df2
David Arenburg
  • 91,361
  • 17
  • 137
  • 196