how to find the complement of a random selection of rows of a dataframe?

Question

I have a data set called data, which I am splitting into 2 new data sets, which I will call test and train.

I want the splitting to be random, without replacement.

Using the code below, I get train to be a new data frame with 35 elements:

rows_in_test <-  35  # number of rows to randomly select 
rows_in_train <- nrow(data) - rows_in_test 
train <- data[sample(nrow(data), rows_in_test), ]

Is there a nice way in R to assign the complement of train to a new data set called test? I am thinking there must be a function for this?

score 1 · Answer 1 · answered Feb 07 '14 at 23:09

1

myData<-data.frame(a=c(1:20), b=c(101:120))
set.seed(123)#to be able to replicate random sampling later
trainRows<-runif(nrow(myData))>0.25 #randomly put aside 25% of the data
train<-myData[trainRows,]#has 13 rows
test<-myData[!trainRows,]#has 7 rows

#following method to select a fixed no. of samples - in this case selecting 5 rows
testRows2<-sort(sample(c(1:nrow(myData)), 5, replace=F))

train2<-myData[-testRows2, ]
test2<-myData[testRows2, ]

answered Feb 07 '14 at 23:09

Seems to be a problem with consistency...I get 4 obs for test and 16 for train... – tumultous_rooster Feb 07 '14 at 23:29
I've added the line about setting the seed after making the post.I too get 16rows for train. As long as you set the seed before you make the call to `runif` you should get consistent results. – Feb 07 '14 at 23:39

how to find the complement of a random selection of rows of a dataframe?

1 Answers1