3

suppose I have two data frames (DF1 & DF2) and both contain the (x,y) coordinates. I would like to extract the pair of (x,y) that is in DF1 but not DF2. Example:

DF1<-data.frame(x=1:3,y=4:6,t=10:12)
DF2<-data.frame(x=3:5,y=6:8,s=1:3)

I want to get

DF_new<-data.frame(x=1:2,y=4:5,t=10:11). 

What should I do for much larger data sets? Thanks!

Cathy
  • 67
  • 1
  • 5
  • 1
    To be clear, you want to find the (x, y) pair that is in BOTH dataframes, but have non-matching t values? – alexwhan Jun 05 '13 at 06:47
  • OH! I made mistakes about the expected output!! I'm really sorry about that. I have updated the question. I want to find the (x,y,t) in which (x,y) is in DF1 but NOT DF2 – Cathy Jun 06 '13 at 07:58
  • OH, I guess @agstudy answers my question. Thanks for your help!! – Cathy Jun 06 '13 at 08:03

3 Answers3

4

Seems like using merge is a good candidate here:

merge(DF1,DF2)
  x y  t s
1 3 6 12 1
agstudy
  • 119,832
  • 17
  • 199
  • 261
3

For very large data sets you may be interested in data.table:

library(data.table)
DF1<-data.frame(x=1:3,y=4:6,t=10:12)
DF2<-data.frame(x=3:5,y=6:8,s=1:3)
library(data.table)
DF1 <- data.table(DF1, key = c("x", "y"))
DF2 <- data.table(DF2, key = c("x", "y"))
DF1[complete.cases(DF1[DF2])] # maybe you want this?
DF2[DF1]
DF1[!DF2] # or maybe you want this?
DF2[!DF1]
Jack Ryan
  • 2,134
  • 18
  • 26
  • Thanks! it works! Does it mean that I have to set the "key" if I wanna compare more than one column value (coz %in% will work if I compare only a single variable instead of a pair of variables)? – Cathy Jun 06 '13 at 08:07
  • Short answer: yes. Long answer: 1) it's much faster (if time is a concern) 2) unless you can think of any more "elegant" way to perform this with %in% 3) this enables you to perform all kinds of wonderful operations on your data set as outlined in the [introduction to data.table file](http://datatable.r-forge.r-project.org/datatable-intro.pdf) and accompanying examples. – Jack Ryan Jun 06 '13 at 12:04
1
library(tidyverse)
DF1<-data.frame(x=1:3,y=4:6,t=10:12)
DF2<-data.frame(x=3:5,y=6:8,s=1:3)

anti_join(DF1, DF2)
#> Joining, by = c("x", "y")
#>   x y  t
#> 1 1 4 10
#> 2 2 5 11
Nettle
  • 3,193
  • 2
  • 22
  • 26