0

I have two data sets and both contain "ID" variable. In both data sets, How I can remain observations that ID exist in both data sets ? I am using R.

Such as

df1 <- structure(list(CustomerId = c(1, 2, 3, 4, 5, 8, 9), Product = structure(c(4L, 
4L, 4L, 3L, 3L, 1L, 2L), .Label = c("abc", "def", "Radio", "Toaster"
), class = "factor")), .Names = c("CustomerId", "Product"), row.names = c(NA, 
-7L), class = "data.frame")
df2 <-
structure(list(CustomerId = c(2, 4, 6, 7), State = structure(c(2L, 
2L, 3L, 1L), .Label = c("aaa", "Alabama", "Ohio"), class = "factor")), .Names = c("CustomerId", 
"State"), row.names = c(NA, -4L), class = "data.frame")

In two data sets, I want to remain observations that exist in both data. (Those would be ID 2 and 4 in both data sets.)

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Boram Lim
  • 313
  • 3
  • 15
  • What's the output you desire? Two new data.frames with the rows for which the ID exists in the other? It would help to have a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output. – MrFlick Apr 19 '15 at 02:02
  • I edited. can you figure out? – Boram Lim Apr 19 '15 at 02:09

2 Answers2

0

You can just use basic subsetting like

subset(df1, CustomerId %in% df2$CustomerId)
subset(df2, CustomerId %in% df1$CustomerId)

of if you use dplyr this is called a semi_join

library(dplyr)
semi_join(df1, df2)
semi_join(df2, df1)
MrFlick
  • 195,160
  • 17
  • 277
  • 295
0

merge() would be one of the easiest solutions. This is inner join equivalent and check other arguments if outer join is in need.

merge(df1, df2, by="CustomerId")[,1:2]
  CustomerId Product
1          2 Toaster
2          4   Radio

merge(df2, df1, by="CustomerId")[,1:2]
  CustomerId   State
1          2 Alabama
2          4 Alabama
Jaehyeon Kim
  • 1,328
  • 11
  • 16