3

Let's say I have three vectors, and I want to compare them to see elements of each are NOT in the others, starting by comparing to "c."

a<-c(1,2,7,8)
b<-c(1,2,3,4)
c<-c(3,4,5,6)

So this works like I expect it to (1 and 2 are in "b" but not "c.")

b[-which(b%in%c)]

returns:

[1] 1 2

But this doesn't tell me which of "a" is not in "c" (all of it, i.e. 1,2,7,8), rather it gives me a numeric vector with nothing in it.

a[-which(a%in%c)]

returns:

integer(0)

It looks like this answer would do what I want in the end, but what am I misunderstanding about how my use of which and %in% works? Better yet, how do I get the answer

[1] 1 2 7 8

from the question of which of "a" is not in "c" when none of "a" is in "c?"

Machavity
  • 30,841
  • 27
  • 92
  • 100
CrunchyTopping
  • 803
  • 7
  • 17

3 Answers3

4

Using logical operations is more reliable:

b[!b %in% c]
# [1] 1 2
a[!a %in% c]
# [1] 1 2 7 8

Note that !a %in% c is the same as !(a %in% c). In this way we ask which of a are in c, get a logical result, and negate it. Using which, on the other hand, works differently: in -which(a %in% c) we also first get a logical vector a %in% c and then which gives the indices of elements of a that belong to c, and get's rid of those elements. In your case we have

which(a %in% c)
# integer(0)

Then you may argue that a[-numeric(0)] should also return

# [1] 1 2 7 8

but that's not how it is in R.

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
3

In case of unique elements, setdiff can be an alternative

setdiff(a, c)
#[1] 1 2 7 8

setdiff(b, c)
#[1] 1 2
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is another option. You can use match and then subset NA values (i.e. values which are not in both vectors). Try out

b[is.na(match(b, c))]
#[1] 1 2

a[is.na(match(a, c))]
#[1] 1 2 7 8
nghauran
  • 6,648
  • 2
  • 20
  • 29