How to partition data into three parts in R?

Question

I want to split my data into 3 parts with the ratio of 6:2:2. Is there a R command that can do that? Thanks.

I used createDataPartition in the caret package, that can split data into two parts. But how to do it with 3 splits? Is that possible? Or I need two steps to do that?

Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. — Ben Bolker, Jun 19 '14 at 21:15
`df$part <- rep(rep(1:3, times=c(3,1,1)), len=nrow(df))`? You don't even say how you want to split leaving alone why. — mlt, Jun 19 '14 at 21:19
@mlt: I can imagine they might want a *random* split (then just scramble your answer with `sample`). — Ben Bolker, Jun 19 '14 at 21:21
split a vector? a data frame/matrix? a column of a data frame? just generate an indicator vector for splitting elsewhere? — Ben Bolker, Jun 19 '14 at 22:37

josliber · Answer 1 · 2014-06-19T23:45:42.730

You randomly split with (roughly) this ratio using sample:

set.seed(144)
spl <- split(iris, sample(c(1, 1, 1, 2, 3), nrow(iris), replace=T))

This split your initial data frame into a list. Now you can check that you've gotten the split ratio you were looking for using lapply with nrow called on each element of your list:

unlist(lapply(spl, nrow))
#  1  2  3 
# 98 26 26

If you wanted to randomly shuffle but to get exactly your ratio for each group, you could shuffle the indices and then select the correct number of each type of index from the shuffled list. For iris, we would want 90 for group 1, 30 for group 2, and 30 for group 3:

set.seed(144)
nums <- c(90, 30, 30)
assignments <- rep(NA, nrow(iris))
assignments[sample(nrow(iris))] <- rep(c(1, 2, 3), nums)
spl2 <- split(iris, assignments)
unlist(lapply(spl2, nrow))
#  1  2  3 
# 90 30 30

if you want a precise split you can `sample` @mlt's comment above — Ben Bolker, Jun 19 '14 at 23:39

How to partition data into three parts in R?

1 Answers1