0

I'm randomly sampling without replacement from a data frame that consists of a single column. This column contains duplicated numeric values.

I'm using dplyr to do this. My data from which I need to sample looks like:

testSO <- data.frame(ToSample = c(round(runif(100, min=1, max=3),0)))

I use the code below to randomly sample 15 rows:

MyRandomSample <- testSO %>%
slice_sample(n=15, replace = FALSE)

Is there a direct method to remove each of these 15 samples from testSO as they are selected? Effectively, slice_sample is doing this under the hood. I can't locate a method for creating a list of the row indices to be able to remove these from testSO. Then I would simply delete the rows that match the row indices.

The real testSO data has some ordering effects, hence why I am using slice_sample instead of slice_head.

I can reorder testSO randomly and then slice_head. But is there a method for both drawing a sample and simultaneously deleting the sampled rows? I found a base R method using -sample that deletes rows from the data frame, but it doesn't then pass the deleted rows to another object.

Michelle
  • 1,281
  • 2
  • 16
  • 31
  • There are a few ways you can retrieve and then delete these sampled frames: (1) if each row has a unique identifier (or combo of columns that *guarantee* unique matches), then you can `anti_join` the original frame with the random sample frame on that id; (2) you cannot use `slice_sample`, but you can `mutate(use = row_number() %in% sample(n(), size=15))` and the retrieve with `filter(use)` and then remove with `filter(!use)`. – r2evans Dec 03 '20 at 22:44

1 Answers1

4

You can randomly draw row indices, and the use them for selection of the random samples as well as removing them from the original data.

rand_ind <- sample(nrow(testSO), 15, replace = FALSE) 
MyRandomSample <- testSO[rand_ind, ]
testSO <- testSO[-rand_ind, ]
kangaroo_cliff
  • 6,067
  • 3
  • 29
  • 42