1

I have 2 data frame d1 and d2. Both of them has a column called value.

I want to separate d2 to two data frame: d3 and d4. d3 should contains the rows with value does not appear in d1, and d4 contains the rows from d2 with value appear in d1.

I tried:

d3 = d2[!is.element(d2$value, d1$value),]

But it seems not correct: the items of d3 is not as expected.

Thanks in advance for your help.

Update:

It seems that anti_join is not the solution

The sample data:

d1 = as.data.frame(c(1,2,3,4,5,6,7,8))
colnames(d1) = "value"

d2 = as.data.frame(c(7,8,9,10,11,12))
colnames(d1) = "value"

So, d3 should contain 9,10,11,12 (because 9,10,11,12 do not appear in d1) and d4 should contain 7,8 (because 7,8 appear in d1)

Jaap
  • 81,064
  • 34
  • 182
  • 193
mamatv
  • 3,581
  • 4
  • 19
  • 25

2 Answers2

1

We can use anti_join to get the elements in one dataset that are not in the other

library(dplyr)
anti_join(d2, d1, by = 'value') %>%
                           arrange(value)
#    value
#1     9
#2    10
#3    11
#4    12

For getting the common elements, either merge or inner_join from dplyr can be used.

inner_join(d1, d2, by ='value')
#   value
#1      7
#2      8

Another approach is setdiff, and intersect from dplyr

 setdiff(d2, d1)
 intersect(d1, d2)

NOTE: Assuming that there is only a single column.

akrun
  • 874,273
  • 37
  • 540
  • 662
1

Your sample data can be expressed this way:

d1 <- data.frame("value"=c(1,2,3,4,5,6,7,8))
d2 <- data.frame("value"=c(7,8,9,10,11,12))

But really those don't need to be data.frames, they are just vectors.

d3 <- d2[! d2$value %in% d1$value,]
d4 <- d2[d2$value %in% d1$value,]

This results in d3 and d4 being vectors since that is basically what the input was anyway. If the data.frames had more than 1 column then you would get data.frames as the result object.

JeremyS
  • 3,497
  • 1
  • 17
  • 19