0

Hi i'm trying to create 10 sub-training set (from a training set of 75%) in loop extracting randomly from a dataframe (DB). i'm using

smp_size<- floor((0.75* nrow(DB))/10) 
train_ind<-sample(seq_len(nrow(DB)), size=(smp_size)) 

training<- matrix(ncol=(ncol(DB)), nrow=(smp_size))
for (i in 1:10){
  training[i]<-DB[train_ind, ]
}

what's wrong?

Tyu1990
  • 147
  • 1
  • 1
  • 6
  • 3
    You tell me. Is there an error message? Are you not happy with the output? Please include these things in your question, and provide some sample data also (you can use built-in datasets like `mtcars`). The point of a good question is that people can take the code and run it on their machines with no hassle. See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jakub Oct 21 '16 at 11:20
  • ok thx.. i'm new here.. sorry – Tyu1990 Oct 21 '16 at 15:46

1 Answers1

1

To partition your dataset in 10 equally sized subsets, you may use the following:

# Randomly order the rows in your training set:
DB <- DB[order(runif(nrow(DB))), ]
# You will create a sequence 1,2,..,10,1,2,...,10,1,2.. you will use to subset
inds <- rep(1:10, nrow(DB)/10)
# split() will store the subsets (created by inds) in a list
subsets <- split(DB, inds)

Note, however, that split() will only give you equally sized subsets. Therefore, it might (and probably will) happen that some of the observations are not be included in any of the subsets.

If you wish to use all observations, causing some subsets to be larger than others, use inds <- rep(1:10, length.out = nrow(DB)) instead

KenHBS
  • 6,756
  • 6
  • 37
  • 52
  • ok, it works.. Thanks!! Just one more question.. it might be stupid: why are you using runif()? – Tyu1990 Oct 21 '16 at 15:29
  • is it for a random ristribution of rows? – Tyu1990 Oct 21 '16 at 15:39
  • Yes, it gives `nrow(DB)` random values, which are then ordered. The order is of those random values is then, of course, also random. Giving you a random mixup of the row indices – KenHBS Oct 21 '16 at 15:45