How to remove items in a vector from another vector

Question

I have two vectors:

a = c(1,1,2,2,3,3,4,4) 
b = c(1)

I want to remove the first match of b from a. Thus, here only the first 1 is removed from a:

c = c(1,2,2,3,3,4,4)

The order of items in a is not important.

I tried this code:

a[a != b]
a[! a %in% b]

Both results are:

[1] 2 2 3 3 4 4.

All numbers of 1 are removed. However, I only want to remove the specific item in b from a.

If b = c(1, 1, 2), then I wish the result

[1] 2 3 3 4 4

a[-(1:3)]

The above code could lead to the result of [1] 2 3 3 4 4. However, I wish it could be more flexible. For example when the order of items are unknown or random:

a = c(3,4,3,1,2,2,1,4)

How can I do it using R?

Thanks, jaySf. a[-(1:3)] may work when the order of items in a are fixed. However, what if a = c(3,4,3,1,2,2,1,4), for example? — luyan, Mar 26 '18 at 10:25
If you don't have duplicates in `b`, you could have used `match` as described here: [Remove first occurrence of elements in a vector from another vector](https://stackoverflow.com/questions/30129684/use-a-lookup-vector-to-remove-first-occurrence-of-its-elements-in-another-vector) — Henrik, Mar 26 '18 at 11:59
Related: [*Find a sequence of numbers in a vector*](https://stackoverflow.com/questions/48660606/find-a-sequence-of-numbers-in-a-vector) — Jaap, Mar 26 '18 at 13:41
also related: https://stackoverflow.com/q/16388405/4137985 and https://stackoverflow.com/q/46657373/4137985 — Cath, Mar 26 '18 at 14:49

Cath · Answer 1 · 2018-03-26T18:22:38.940

Taking inspiration from this answer to one of the questions I linked in comment, you can use fsetdiff from the package data.table.
It takes all as argument, which avoids having only the unique values returned, as happens with setdiff:

library(data.table)

# with your first example (b = c(1)):
unlist(fsetdiff(data.table(v1=a), data.table(v1=b), all = TRUE))
# v11 v12 v13 v14 v15 v16 v17 
#  1   2   2   3   3   4   4

# with second example (b = c(1, 1, 2)):
unlist(fsetdiff(data.table(v1=a), data.table(v1=b), all = TRUE))
# v11 v12 v13 v14 v15 
#  2   3   3   4   4

score 4 · Answer 2 · answered Mar 26 '18 at 15:07

vecsets package can perform standard set operations, while retaining duplicates:

vecsets::vsetdiff( c(1,1,2,2,3,3,4,4), c(1) )
## [1] 1 2 2 3 3 4 4

vecsets::vsetdiff( c(1,1,2,2,3,3,4,4), c(1,1,2) )
## [1] 2 3 3 4 4

Note that it will preserve the order of the first argument. Using your last example:

vecsets::vsetdiff( c(3,4,3,1,2,2,1,4), c(1,1,2) )
## [1] 3 4 3 2 4

jay.sf · Answer 3 · 2018-03-26T14:17:14.393

2

You can use which()

a = c(3, 4, 3, 1, 2, 2, 1, 4)
a
## [1] 3 4 3 1 2 2 1 4

b = 1

a[- which(a %in% b)[1]]
## [1] 3 4 3 2 2 1 4

Case b has two elements:

b2 = c(1, 2)

sapply(seq_along(b1), function(x) a <<- a[- which(a == x)[1]])[[2]]
## [1] 3 4 3 2 1 4

Or three...

b3 <- c(1, 2, 3)

sapply(seq_along(b1), function(x) a <<- a[- which(a == x)[1]])[[3]]
# [1] 4 3 2 1 4

edited Mar 26 '18 at 14:17

answered Mar 26 '18 at 10:30

jay.sf

60,139
8
53
110

Thanks again for your help! It works fine when the list of b contains one value. I am now trying a more complex list. For example, b = c(1,1,2). – luyan Mar 26 '18 at 10:37
4

@luyan There might be something better, but the simple `for(x in b) a <- a[-which(a == x)[1]]` works. Do this on a copy of `a` if you don't want to mutate it. – John Coleman Mar 26 '18 at 10:43
3

@luyan glad it works, but in R I tend to regard any use of a for loop as a failure of imagination. On the other hand, if a quick and dirty loop does what you want and the problem sizes are small enough that efficiency isn't a concern, there isn't much reason to spend a lot of time coming up with a vectorized approach. – John Coleman Mar 26 '18 at 11:02
@JohnColeman I've added a vectorized approach, thanks for your inspiration. – jay.sf Mar 26 '18 at 13:35
2

In response to a comment on a now deleted post referring to this one; repeated iterations of a function like this feels like a case where a for loop in R is actually the right thing to do. The second half of John Coleman's comment about is particularly relevant. This isn't the kind of operation that R is natively good at, so if more efficiency is needed, moving to a C/C++ function is likely to be more fruitful than trying to rework into the *apply paradigm. See perhaps the Rcpp package, which makes this relatively easy. – Aaron left Stack Overflow Mar 26 '18 at 14:00
@Aaron Thanks for your comment, I solved this primarily to meet the challenge. – jay.sf Mar 26 '18 at 14:02
1

@Aaron Good points. Knowing when a loop is appropriate in R is one of the things that I still haven't mastered. I use them often enough, but when I do so I have trouble shaking the feeling that I am missing a more elegant solution. I have had too many occasions where I have written complicated loop-based solutions only to later find a 1-liner which does the same thing much faster. – John Coleman Mar 26 '18 at 16:24
@Aaron I just looked at my copy of "Advanced R" and in its chapter on Rcpp it gives as a typical use-case "Loops that can’t be easily vectorised because subsequent iterations depend on previous ones", which seems to be a good description of this current problem. – John Coleman Mar 26 '18 at 16:30
@JohnColeman: I've had that happen to me on many occasions too. :) Though that's becoming less and less often, and now I'm actually more likely to know that something could be done with *apply, but to chose a for loop instead for quicker writing and more readable code. I'm learning to optimize my time over the computer's time. :) – Aaron left Stack Overflow Mar 26 '18 at 17:26
Using the global assignment inside a `sapply` loop is a good case for using `Reduce` instead. – AdamO Mar 26 '18 at 19:47

score 1 · Answer 4 · answered Mar 26 '18 at 17:33

I don't think that the following is the best solution (the vecsets approach strikes me as the best), but @Aaron's comment about possibly using Rcpp struck me as interesting. This is the first time I used that package. If nothing else, the fact that I was able to get working code in less than 20 minutes underscores his point that Rcpp makes it relatively easy:

library(Rcpp)
cppFunction('
  NumericVector difference(NumericVector xs, NumericVector ys){
    int m = xs.size();
    int n = ys.size();
    float flag = 1 + abs(max(xs)) + abs(max(ys)); //occurs in neither xs nor ys
    NumericVector zs = clone(xs);
    for(int i = 0; i < n; i++){
      double y = ys[i];
      int j = 0;
      while(j < m && zs[j]!= y) j++;
      if(j < m) zs[j] = flag;
    }
    int count = 0;
    for(int k = 0; k < m; k++){
      if(zs[k] < flag) count++;
    }
    NumericVector ws(count);
    int k = 0;
    for(int j = 0; j < m; j++){
      if(zs[j] < flag){
        ws[k] = zs[j];
        k++;
      }
    }
    return ws;
  }
')

After you source this:

> a = c(1,1,2,2,3,3,4,4)
> b = c(1,2,1)
> difference(a,b)
[1] 2 3 3 4 4

Since this was my first attempt at such code, I'm sure that it could be improved in multiple ways.

score 0 · Answer 5 · answered Mar 26 '18 at 17:43

0

A little frustrating, the syntactical order, but Reduce and which do it with just Base R.

Reduce(b, a) a[-which(a==b)[1]], a, b)

answered Mar 26 '18 at 17:43

AdamO

4,283
1
27
39

How to remove items in a vector from another vector

5 Answers5