2

I have two vectors:

a = c(1,1,2,2,3,3,4,4) 
b = c(1)

I want to remove the first match of b from a. Thus, here only the first 1 is removed from a:

c = c(1,2,2,3,3,4,4)

The order of items in a is not important.

I tried this code:

a[a != b]
a[! a %in% b] 

Both results are:

[1] 2 2 3 3 4 4.

All numbers of 1 are removed. However, I only want to remove the specific item in b from a.

If b = c(1, 1, 2), then I wish the result

[1] 2 3 3 4 4

a[-(1:3)]

The above code could lead to the result of [1] 2 3 3 4 4. However, I wish it could be more flexible. For example when the order of items are unknown or random:

a = c(3,4,3,1,2,2,1,4)

How can I do it using R?

Jaap
  • 81,064
  • 34
  • 182
  • 193
luyan
  • 41
  • 4
  • Something like `a[-(1:3)]`? – jay.sf Mar 26 '18 at 10:23
  • Thanks, jaySf. a[-(1:3)] may work when the order of items in a are fixed. However, what if a = c(3,4,3,1,2,2,1,4), for example? – luyan Mar 26 '18 at 10:25
  • 1
    If you don't have duplicates in `b`, you could have used `match` as described here: [Remove first occurrence of elements in a vector from another vector](https://stackoverflow.com/questions/30129684/use-a-lookup-vector-to-remove-first-occurrence-of-its-elements-in-another-vector) – Henrik Mar 26 '18 at 11:59
  • Related: [*Find a sequence of numbers in a vector*](https://stackoverflow.com/questions/48660606/find-a-sequence-of-numbers-in-a-vector) – Jaap Mar 26 '18 at 13:41
  • also related: https://stackoverflow.com/q/16388405/4137985 and https://stackoverflow.com/q/46657373/4137985 – Cath Mar 26 '18 at 14:49

5 Answers5

4

Taking inspiration from this answer to one of the questions I linked in comment, you can use fsetdiff from the package .
It takes all as argument, which avoids having only the unique values returned, as happens with setdiff:

library(data.table)

# with your first example (b = c(1)):
unlist(fsetdiff(data.table(v1=a), data.table(v1=b), all = TRUE))
# v11 v12 v13 v14 v15 v16 v17 
#  1   2   2   3   3   4   4

# with second example (b = c(1, 1, 2)):
unlist(fsetdiff(data.table(v1=a), data.table(v1=b), all = TRUE))
# v11 v12 v13 v14 v15 
#  2   3   3   4   4
Cath
  • 23,906
  • 5
  • 52
  • 86
4

vecsets package can perform standard set operations, while retaining duplicates:

vecsets::vsetdiff( c(1,1,2,2,3,3,4,4), c(1) )
## [1] 1 2 2 3 3 4 4

vecsets::vsetdiff( c(1,1,2,2,3,3,4,4), c(1,1,2) )
## [1] 2 3 3 4 4

Note that it will preserve the order of the first argument. Using your last example:

vecsets::vsetdiff( c(3,4,3,1,2,2,1,4), c(1,1,2) )
## [1] 3 4 3 2 4
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
2

You can use which()

a = c(3, 4, 3, 1, 2, 2, 1, 4)
a
## [1] 3 4 3 1 2 2 1 4

b = 1

a[- which(a %in% b)[1]]
## [1] 3 4 3 2 2 1 4

Case b has two elements:

b2 = c(1, 2)

sapply(seq_along(b1), function(x) a <<- a[- which(a == x)[1]])[[2]]
## [1] 3 4 3 2 1 4

Or three...

b3 <- c(1, 2, 3)

sapply(seq_along(b1), function(x) a <<- a[- which(a == x)[1]])[[3]]
# [1] 4 3 2 1 4
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks again for your help! It works fine when the list of b contains one value. I am now trying a more complex list. For example, b = c(1,1,2). – luyan Mar 26 '18 at 10:37
  • 4
    @luyan There might be something better, but the simple `for(x in b) a <- a[-which(a == x)[1]]` works. Do this on a copy of `a` if you don't want to mutate it. – John Coleman Mar 26 '18 at 10:43
  • 3
    @luyan glad it works, but in R I tend to regard any use of a for loop as a failure of imagination. On the other hand, if a quick and dirty loop does what you want and the problem sizes are small enough that efficiency isn't a concern, there isn't much reason to spend a lot of time coming up with a vectorized approach. – John Coleman Mar 26 '18 at 11:02
  • @JohnColeman I've added a vectorized approach, thanks for your inspiration. – jay.sf Mar 26 '18 at 13:35
  • 2
    In response to a comment on a now deleted post referring to this one; repeated iterations of a function like this feels like a case where a for loop in R is actually the right thing to do. The second half of John Coleman's comment about is particularly relevant. This isn't the kind of operation that R is natively good at, so if more efficiency is needed, moving to a C/C++ function is likely to be more fruitful than trying to rework into the *apply paradigm. See perhaps the Rcpp package, which makes this relatively easy. – Aaron left Stack Overflow Mar 26 '18 at 14:00
  • @Aaron Thanks for your comment, I solved this primarily to meet the challenge. – jay.sf Mar 26 '18 at 14:02
  • 1
    @Aaron Good points. Knowing when a loop is appropriate in R is one of the things that I still haven't mastered. I use them often enough, but when I do so I have trouble shaking the feeling that I am missing a more elegant solution. I have had too many occasions where I have written complicated loop-based solutions only to later find a 1-liner which does the same thing much faster. – John Coleman Mar 26 '18 at 16:24
  • @Aaron I just looked at my copy of "Advanced R" and in its chapter on Rcpp it gives as a typical use-case "Loops that can’t be easily vectorised because subsequent iterations depend on previous ones", which seems to be a good description of this current problem. – John Coleman Mar 26 '18 at 16:30
  • @JohnColeman: I've had that happen to me on many occasions too. :) Though that's becoming less and less often, and now I'm actually more likely to know that something could be done with *apply, but to chose a for loop instead for quicker writing and more readable code. I'm learning to optimize my time over the computer's time. :) – Aaron left Stack Overflow Mar 26 '18 at 17:26
  • Using the global assignment inside a `sapply` loop is a good case for using `Reduce` instead. – AdamO Mar 26 '18 at 19:47
1

I don't think that the following is the best solution (the vecsets approach strikes me as the best), but @Aaron's comment about possibly using Rcpp struck me as interesting. This is the first time I used that package. If nothing else, the fact that I was able to get working code in less than 20 minutes underscores his point that Rcpp makes it relatively easy:

library(Rcpp)
cppFunction('
  NumericVector difference(NumericVector xs, NumericVector ys){
    int m = xs.size();
    int n = ys.size();
    float flag = 1 + abs(max(xs)) + abs(max(ys)); //occurs in neither xs nor ys
    NumericVector zs = clone(xs);
    for(int i = 0; i < n; i++){
      double y = ys[i];
      int j = 0;
      while(j < m && zs[j]!= y) j++;
      if(j < m) zs[j] = flag;
    }
    int count = 0;
    for(int k = 0; k < m; k++){
      if(zs[k] < flag) count++;
    }
    NumericVector ws(count);
    int k = 0;
    for(int j = 0; j < m; j++){
      if(zs[j] < flag){
        ws[k] = zs[j];
        k++;
      }
    }
    return ws;
  }
')

After you source this:

> a = c(1,1,2,2,3,3,4,4)
> b = c(1,2,1)
> difference(a,b)
[1] 2 3 3 4 4

Since this was my first attempt at such code, I'm sure that it could be improved in multiple ways.

John Coleman
  • 51,337
  • 7
  • 54
  • 119
0

A little frustrating, the syntactical order, but Reduce and which do it with just Base R.

Reduce(b, a) a[-which(a==b)[1]], a, b)

AdamO
  • 4,283
  • 1
  • 27
  • 39