Objective: Randomly divide a data frame into 3 samples.
- one sample with 60% of the rows
- other two samples with 20% of the rows
- samples should not have duplicates of others (i.e. sample without replacement).
Here's a clunky solution:
allrows <- 1:nrow(mtcars)
set.seed(7)
trainrows <- sample(allrows, replace = F, size = 0.6*length(allrows))
test_cvrows <- allrows[-trainrows]
testrows <- sample(test_cvrows, replace=F, size = 0.5*length(test_cvrows))
cvrows <- test_cvrows[-which(test_cvrows %in% testrows)]
train <- mtcars[trainrows,]
test <- mtcars[testrows,]
cvr <- mtcars[cvrows,]
There must be something easier, perhaps in a package. dplyr
has the sample_frac
function, but that seems to target a single sample, not a split into multiple.
Close, but not quite the answer to this question: Random Sample with multiple probabilities in R