How can I get the complement of vector y in vector x

Question

That's x \ y using mathematical notation. Suppose

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3) 
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)

How can I get a vector with ALL the values in x that are not in y. i.e the result should be:

2,1,1,3

There is a similar question here. However, none of the answers returns the result that I want.

You need to define your problem more precisely. Why is 1 in your output even though 1 occurs in y? And then why only two 1s and not three? It would be helpful if you could some kind of pseudocode specifying what you want to compute. — Jyotirmoy Bhattacharya, Mar 21 '10 at 18:19
y is a proper subset of x. And I am looking for its complement in x. I don't care about positions. Had y been c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3) I should get the rest 4 zeros. — gd047, Mar 21 '10 at 18:31
Y can not be a proper subset of X, because neither of them are sets! — hadley, Mar 21 '10 at 19:06

score 13 · Accepted Answer · answered Mar 22 '10 at 10:18

Here a solution using pmatch (this gives the "complement" as you require):

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3)
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
res <- x[is.na(pmatch(x,y))]

From pmatch documentation:

"If duplicates.ok is FALSE, values of table once matched are excluded from the search for subsequent matches."

score 7 · Answer 2 · answered Mar 21 '10 at 18:16

How about this:

R> x[x!=y]
[1] 2 1 1 1 3
Warning message:
In x != y : longer object length is not a multiple of shorter object length
R>

This is difficult problem, I think, as you are mixing values and positions. The easier solution relies on one of the 'set' functions in R:

R> setdiff(x,y)
[1] 2 3

but that uses only values and not position.

The problem with the answer I gave you is the implicit use of recycling and the warning it triggered: as your x is longer than your y, the first few values of y get reused. But recycling is considered "clean" on when the longer vector has an integer-multiple length of the length of the shorter vector. But that is not the case here, and hence I am not sure we can solve your problem all that cleanly.

and setdiff(x, y) is, indeed, the standard definition for x \ y ... since it's a set operation, it first finds unique values and then compares the two vectors. — William Doane, Mar 21 '10 at 18:19

score 3 · Answer 3 · answered Mar 21 '10 at 18:53

3

If I understand the problem, you can use table to compute the difference in the number of elements in each set and then create a vector based on the difference of those counts (note that this won't necessarily give you the order you gave in your question).

> diffs <- table(x) - table(factor(y, levels=levels(factor(x))))
> rep(as.numeric(names(diffs)), ifelse(diffs < 0, 0, diffs))
[1] 1 1 2 3

answered Mar 21 '10 at 18:53

Jonathan Chang

24,567
5
34
33

Thanks, this one is working indeed!. Order does not matter. I am just curious to see how can this be achieved using library `sets` (using a function like `set_complement` perhaps) or another "one liner". I can hardly believe there's no way to get this directly. – gd047 Mar 21 '10 at 19:30
1

There would be if you were working with true sets. Sets don't have duplicates and order doesn't matter. All the set functions in R are going to follow from that definition... what you actually have is the set X whose elements are {0, 1, 2, 3} and the set Y whose elements are {0, 1}. Thus X \ Y is {2, 3}. While what you're looking for for output is well defined, it's NOT a set operation, so you're going to need to do a little work to get it. You can always wrap Jonathan's code in a function, if you must have a one-line solution. – William Doane Mar 21 '10 at 22:54
If sets don't accept duplicates, there are also multisets that do. – gd047 Mar 22 '10 at 13:06

How can I get the complement of vector y in vector x

3 Answers3

Linked