9

That's x \ y using mathematical notation. Suppose

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3) 
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)

How can I get a vector with ALL the values in x that are not in y. i.e the result should be:

2,1,1,3

There is a similar question here. However, none of the answers returns the result that I want.

Community
  • 1
  • 1
gd047
  • 29,749
  • 18
  • 107
  • 146
  • You need to define your problem more precisely. Why is 1 in your output even though 1 occurs in y? And then why only two 1s and not three? It would be helpful if you could some kind of pseudocode specifying what you want to compute. – Jyotirmoy Bhattacharya Mar 21 '10 at 18:19
  • y is a proper subset of x. And I am looking for its complement in x. I don't care about positions. Had y been c(0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3) I should get the rest 4 zeros. – gd047 Mar 21 '10 at 18:31
  • Y can not be a proper subset of X, because neither of them are sets! – hadley Mar 21 '10 at 19:06

3 Answers3

13

Here a solution using pmatch (this gives the "complement" as you require):

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3)
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
res <- x[is.na(pmatch(x,y))]

From pmatch documentation:

"If duplicates.ok is FALSE, values of table once matched are excluded from the search for subsequent matches."

teucer
  • 6,060
  • 2
  • 26
  • 36
7

How about this:

R> x[x!=y]
[1] 2 1 1 1 3
Warning message:
In x != y : longer object length is not a multiple of shorter object length
R>

This is difficult problem, I think, as you are mixing values and positions. The easier solution relies on one of the 'set' functions in R:

R> setdiff(x,y)
[1] 2 3

but that uses only values and not position.

The problem with the answer I gave you is the implicit use of recycling and the warning it triggered: as your x is longer than your y, the first few values of y get reused. But recycling is considered "clean" on when the longer vector has an integer-multiple length of the length of the shorter vector. But that is not the case here, and hence I am not sure we can solve your problem all that cleanly.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 2
    and setdiff(x, y) is, indeed, the standard definition for x \ y ... since it's a set operation, it first finds unique values and then compares the two vectors. – William Doane Mar 21 '10 at 18:19
3

If I understand the problem, you can use table to compute the difference in the number of elements in each set and then create a vector based on the difference of those counts (note that this won't necessarily give you the order you gave in your question).

> diffs <- table(x) - table(factor(y, levels=levels(factor(x))))
> rep(as.numeric(names(diffs)), ifelse(diffs < 0, 0, diffs))
[1] 1 1 2 3
Jonathan Chang
  • 24,567
  • 5
  • 34
  • 33
  • Thanks, this one is working indeed!. Order does not matter. I am just curious to see how can this be achieved using library `sets` (using a function like `set_complement` perhaps) or another "one liner". I can hardly believe there's no way to get this directly. – gd047 Mar 21 '10 at 19:30
  • 1
    There would be if you were working with true sets. Sets don't have duplicates and order doesn't matter. All the set functions in R are going to follow from that definition... what you actually have is the set X whose elements are {0, 1, 2, 3} and the set Y whose elements are {0, 1}. Thus X \ Y is {2, 3}. While what you're looking for for output is well defined, it's NOT a set operation, so you're going to need to do a little work to get it. You can always wrap Jonathan's code in a function, if you must have a one-line solution. – William Doane Mar 21 '10 at 22:54
  • If sets don't accept duplicates, there are also multisets that do. – gd047 Mar 22 '10 at 13:06