r - Sampling list according to another vector

Question

Consider two vectors :

 R> l
 [1] "a" "b" "c" "d" "e" "f"
 R> s
 [1] "b" "d" "f"

I cannot hard-code the indexes to be removed from the sampling. How can I sample elements from l that are not present in s?

Use `?sample` and `?setdiff`: along the lines of `sample(setdiff(l,s), ....)` — user20650, Apr 10 '15 at 12:33

score 2 · Answer 1 · answered Apr 10 '15 at 12:34

2

you can try this

l <-c("a","b","c","d","e","f")
s <- c("b", "d", "f")
l2 <- l[!l %in% s] # elements present in "l" and not in "s"
sample(l2, 10, replace = TRUE)

answered Apr 10 '15 at 12:34

Mamoun Benghezal

5,264
7
28
33

score 2 · Answer 2 · edited May 23 '17 at 11:56

I just found it using this post

sample(l[-c(match(s, l))])

PS : Sorry, for asking before searching thoroughly.

EDIT :-

For the vectors :

R> l <- c(1:5000)
R> s <- c(100:1100)

I ran the micobenchmark :

R> microbenchmark(func(l, s), sample(l[-c(match(s, l))], 10), times=1000L)

Here, func() is defined as follows :

R> func <- function(l, s) {
    l2 <- l[!l %in% s] # elements present in "l" and not in "s"
    return(sample(l2, 10, replace = TRUE))
}

The microbenchmark returned:

Unit: microseconds
                          expr   min    lq  mean median    uq  max neval cld
                    func(l, s) 218.7 221.3 234.1  222.1 229.5 2937  1000   a  
sample(l[-c(match(s, l))], 10) 222.5 226.9 238.8  227.8 235.7 2933  1000   a

I guess, their performance is quite comparable.

As you can see, there are many alternatives. It might be fun to do a time test: create a couple large `l` and `s` vectors, say 1000 elements, and run each of the answers supplied inside `microbenchmark` . — Carl Witthoft, Apr 10 '15 at 12:59

r - Sampling list according to another vector

2 Answers2