Filtering the item in dataframe which does not appear in other data frame

Question

I have 2 data frame d1 and d2. Both of them has a column called value.

I want to separate d2 to two data frame: d3 and d4. d3 should contains the rows with value does not appear in d1, and d4 contains the rows from d2 with value appear in d1.

I tried:

d3 = d2[!is.element(d2$value, d1$value),]

But it seems not correct: the items of d3 is not as expected.

Thanks in advance for your help.

Update:

It seems that anti_join is not the solution

The sample data:

d1 = as.data.frame(c(1,2,3,4,5,6,7,8))
colnames(d1) = "value"

d2 = as.data.frame(c(7,8,9,10,11,12))
colnames(d1) = "value"

So, d3 should contain 9,10,11,12 (because 9,10,11,12 do not appear in d1) and d4 should contain 7,8 (because 7,8 appear in d1)

Hello, I think my question is different. – mamatv Feb 04 '16 at 06:26 — mamatv, Feb 04 '16 at 06:26

akrun · Answer 1 · 2016-02-04T06:33:53.367

1

We can use anti_join to get the elements in one dataset that are not in the other

library(dplyr)
anti_join(d2, d1, by = 'value') %>%
                           arrange(value)
#    value
#1     9
#2    10
#3    11
#4    12

For getting the common elements, either merge or inner_join from dplyr can be used.

inner_join(d1, d2, by ='value')
#   value
#1      7
#2      8

Another approach is setdiff, and intersect from dplyr

 setdiff(d2, d1)
 intersect(d1, d2)

NOTE: Assuming that there is only a single column.

edited Feb 04 '16 at 06:33

answered Feb 04 '16 at 06:16

akrun

874,273
37
540
662

1

At least make them provide sample data and an expected output! – Brandon Bertelsen Feb 04 '16 at 06:17
@BrandonBertelsen Thanks for the comment. I thought it was straightforward from the description. – akrun Feb 04 '16 at 06:18
Hello @akrun, I want to join by rows, not by columns, but anw thank you very much for your answer and sorry for my question if it is not clear – mamatv Feb 04 '16 at 06:27
@mamatv Please check the answer. It is the same as your expected. – akrun Feb 04 '16 at 06:30
@BrandonBertelsen The OP had updated with a sample and expected output – akrun Feb 04 '16 at 06:50

JeremyS · Accepted Answer · 2016-02-04T06:48:57.787

Your sample data can be expressed this way:

d1 <- data.frame("value"=c(1,2,3,4,5,6,7,8))
d2 <- data.frame("value"=c(7,8,9,10,11,12))

But really those don't need to be data.frames, they are just vectors.

d3 <- d2[! d2$value %in% d1$value,]
d4 <- d2[d2$value %in% d1$value,]

This results in d3 and d4 being vectors since that is basically what the input was anyway. If the data.frames had more than 1 column then you would get data.frames as the result object.

Filtering the item in dataframe which does not appear in other data frame

2 Answers2