How do I both randomly select rows from a data frame and delete each row as it has been selected?

Question

I'm randomly sampling without replacement from a data frame that consists of a single column. This column contains duplicated numeric values.

I'm using dplyr to do this. My data from which I need to sample looks like:

testSO <- data.frame(ToSample = c(round(runif(100, min=1, max=3),0)))

I use the code below to randomly sample 15 rows:

MyRandomSample <- testSO %>%
slice_sample(n=15, replace = FALSE)

Is there a direct method to remove each of these 15 samples from testSO as they are selected? Effectively, slice_sample is doing this under the hood. I can't locate a method for creating a list of the row indices to be able to remove these from testSO. Then I would simply delete the rows that match the row indices.

The real testSO data has some ordering effects, hence why I am using slice_sample instead of slice_head.

I can reorder testSO randomly and then slice_head. But is there a method for both drawing a sample and simultaneously deleting the sampled rows? I found a base R method using -sample that deletes rows from the data frame, but it doesn't then pass the deleted rows to another object.

There are a few ways you can retrieve and then delete these sampled frames: (1) if each row has a unique identifier (or combo of columns that *guarantee* unique matches), then you can `anti_join` the original frame with the random sample frame on that id; (2) you cannot use `slice_sample`, but you can `mutate(use = row_number() %in% sample(n(), size=15))` and the retrieve with `filter(use)` and then remove with `filter(!use)`. — r2evans, Dec 03 '20 at 22:44

kangaroo_cliff · Accepted Answer · 2020-12-03T22:59:21.920

4

You can randomly draw row indices, and the use them for selection of the random samples as well as removing them from the original data.

rand_ind <- sample(nrow(testSO), 15, replace = FALSE) 
MyRandomSample <- testSO[rand_ind, ]
testSO <- testSO[-rand_ind, ]

edited Dec 03 '20 at 22:59

answered Dec 03 '20 at 22:41

kangaroo_cliff

6,067
3
29
42

1

Thank you. I figured there had to be a way to do it. Base R to the rescue! – Michelle Dec 03 '20 at 23:56

How do I both randomly select rows from a data frame and delete each row as it has been selected?

1 Answers1