You randomly split with (roughly) this ratio using sample
:
set.seed(144)
spl <- split(iris, sample(c(1, 1, 1, 2, 3), nrow(iris), replace=T))
This split your initial data frame into a list. Now you can check that you've gotten the split ratio you were looking for using lapply
with nrow
called on each element of your list:
unlist(lapply(spl, nrow))
# 1 2 3
# 98 26 26
If you wanted to randomly shuffle but to get exactly your ratio for each group, you could shuffle the indices and then select the correct number of each type of index from the shuffled list. For iris, we would want 90 for group 1, 30 for group 2, and 30 for group 3:
set.seed(144)
nums <- c(90, 30, 30)
assignments <- rep(NA, nrow(iris))
assignments[sample(nrow(iris))] <- rep(c(1, 2, 3), nums)
spl2 <- split(iris, assignments)
unlist(lapply(spl2, nrow))
# 1 2 3
# 90 30 30