Finding elements that do not overlap between two vectors

Question

I'm trying to identify elements which are not included in the other vector. For instance in two vectors I have

list.a <- c("James", "Mary", "Jack", "Sonia", "Michelle", "Vincent")

list.b <- c("James", "Sonia", "Vincent")

is there a way to verify which people do not overlap? In the example, I would want to get the vector result that contains Mary, Jack, and Michelle.

Any suggestions will help!

score 44 · Accepted Answer · answered Feb 05 '14 at 10:13

44

Yes, there is a way:

setdiff(list.a, list.b)
# [1] "Mary"     "Jack"     "Michelle"

answered Feb 05 '14 at 10:13

Julius Vainora

47,421
9
90
102

3

Just be aware that all the functions in this group (setdiff, intersect, union, etc) will ignore duplicates. If you have lists with duplicate values, you'll have to play around a bit. In fact there was a SO question, well answered, for just this problem a couple days ago. -- which of course now I can't find :-( – Carl Witthoft Feb 05 '14 at 12:40
1

@CarlWitthoft if you look at the source for `setdiff` you'll see that it's easy to modify to not ignore duplicates – hadley Feb 05 '14 at 14:06
@hadley I posted a 'flexible' version of setdiff, just FYI :-) – Carl Witthoft Feb 05 '14 at 14:22

score 37 · Answer 2 · answered Jul 13 '18 at 14:44

I think it should be mentioned that the accpeted answer is is only partially correct. The command setdiff(list.a, list.b) finds the non-overlapping elements only if these elements are contained in the object that is used as the first argument!.

If you are not aware of this behaviour and did setdiff(list.b, list.a) instead, the results would be character(0) in this case which would lead you to conclude that there are no non-overlapping elements.

Using a slightly extended example for illustration, an obvious quick fix is:

list.a <- c("James", "Mary", "Jack", "Sonia", "Michelle", "Vincent")
list.b <- c("James", "Sonia", "Vincent", "Iris")

c(setdiff(list.b, list.a), setdiff(list.a, list.b))
# [1] "Iris"     "Mary"     "Jack"     "Michelle"

Exactly what was happening to me. Thanks! – Rafs Nov 20 '20 at 17:58 — Rafs, Nov 20 '20 at 17:58

Carl Witthoft · Answer 3 · 2014-02-05T16:15:00.643

An extended answer based on the comments from Hadley and myself: here's how to allow for duplicates.

Final Edit: I do not recommend anyone use this, because the result may not be what you expect. If there is a repeated value in x which is not in y, you will see that value repeated in the output. But: if, say, there are four 9s in x and one 9 in y, all the 9s will be removed. One might expect to retain three of them; that takes messier code.

mysetdiff<-function (x, y, multiple=FALSE) 
{
    x <- as.vector(x)
    y <- as.vector(y)
    if (length(x) || length(y)) {
        if (!multiple) {
             unique( x[match(x, y, 0L) == 0L])  
              }else  x[match(x, y, 0L) == 0L] 
        } else x
}

Rgames> x
[1]  8  9  6 10  9
Rgames> y
[1] 5 3 8 8 1
Rgames> setdiff(x,y)
[1]  9  6 10
Rgames> mysetdiff(x,y)
[1]  9  6 10
Rgames> mysetdiff(x,y,mult=T)
[1]  9  6 10  9
Rgames> mysetdiff(y,x,mult=T)
[1] 5 3 1
Rgames> setdiff(y,x)
[1] 5 3 1

I would store a temporary to the result of the `if` statement, and call `unique` on that in the case `!multiple`. It would be easier to read. — Matthew Lundberg, Feb 05 '14 at 14:25

score 2 · Answer 4 · answered Mar 23 '18 at 09:46

2

A nice one-liner that applies to duplicates:

anti_join(data_frame(c(1,1,2,2)), data_frame(c(1,1)))

This returns the data frame {2,2}. This however doesn't apply to the case of 1,2 in 1,1,2,2, because it finds it twice

answered Mar 23 '18 at 09:46

Noale

59
2

Finding elements that do not overlap between two vectors

4 Answers4

Linked