-5

I want to split my data into 3 parts with the ratio of 6:2:2. Is there a R command that can do that? Thanks.

I used createDataPartition in the caret package, that can split data into two parts. But how to do it with 3 splits? Is that possible? Or I need two steps to do that?

Mark Miller
  • 12,483
  • 23
  • 78
  • 132
user697911
  • 10,043
  • 25
  • 95
  • 169
  • 1
    Please consider including a *small* [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so we can better understand and more easily answer your question. – Ben Bolker Jun 19 '14 at 21:15
  • 5
    Why not the ratio 3/1/1? – agstudy Jun 19 '14 at 21:17
  • `df$part <- rep(rep(1:3, times=c(3,1,1)), len=nrow(df))`? You don't even say how you want to split leaving alone why. – mlt Jun 19 '14 at 21:19
  • 1
    @mlt: I can imagine they might want a *random* split (then just scramble your answer with `sample`). – Ben Bolker Jun 19 '14 at 21:21
  • Exactly, random sampling of the data, with 3 splits. – user697911 Jun 19 '14 at 21:53
  • split a vector? a data frame/matrix? a column of a data frame? just generate an indicator vector for splitting elsewhere? – Ben Bolker Jun 19 '14 at 22:37

1 Answers1

1

You randomly split with (roughly) this ratio using sample:

set.seed(144)
spl <- split(iris, sample(c(1, 1, 1, 2, 3), nrow(iris), replace=T))

This split your initial data frame into a list. Now you can check that you've gotten the split ratio you were looking for using lapply with nrow called on each element of your list:

unlist(lapply(spl, nrow))
#  1  2  3 
# 98 26 26

If you wanted to randomly shuffle but to get exactly your ratio for each group, you could shuffle the indices and then select the correct number of each type of index from the shuffled list. For iris, we would want 90 for group 1, 30 for group 2, and 30 for group 3:

set.seed(144)
nums <- c(90, 30, 30)
assignments <- rep(NA, nrow(iris))
assignments[sample(nrow(iris))] <- rep(c(1, 2, 3), nums)
spl2 <- split(iris, assignments)
unlist(lapply(spl2, nrow))
#  1  2  3 
# 90 30 30 
josliber
  • 43,891
  • 12
  • 98
  • 133