0

I am trying to subset a data set with 10 variables and 10000 observations into a training set and testing set so I can create a logistic regression model. I create the length of what the training set will be and the length of what the testing set will be.

data(optiva)
n <- length(optiva$Age)
ntrain <- n*.70
ntest <- n*.30

# Random sample the data set to build the model
train <- optiva[sample(1:n, ntrain, replace=FALSE),]
test <- optiva[-train, ]

Creating the training set works just fine, but when I run the last line to try and create the testing set, I get an error message that says:

Error in xj[i] : invalid subscript type 'list'

I tried changing to code to

test <- optiva[!train, ]

and I get a testing set with over 37 thousand observations, not 3000. I've looked at how to subset data and tried to follow along. Why is it not working for me?

  • Your trying to index with the data, which does not work. Something like this should work: `i_train <- sample(1:n, ntrain, replace=FALSE);train <- optiva[train, ];test <- optiva[-train, ]` I can not check it, however, because your example is not reporducible. What is `optiva`? – Stibu Jun 21 '16 at 20:02
  • @Stibu : Maybe you are trying to say this `train <- optiva[i_train, ];test <- optiva[-i_train, ]` – user2100721 Jun 21 '16 at 20:13
  • @user2100721 Yes, I forgot the i_. Thanks for pointing thi out. – Stibu Jun 21 '16 at 20:23
  • Yep that worked! I did `test <- optiva[-i_train, ]` and it worked just fine. Why was it not working before? Optiva is just the data set I was working with, not one built into R. – Robert Clark Jun 21 '16 at 21:11
  • Ok I think I am seeing why that wasn't working before. Thanks for the help! – Robert Clark Jun 21 '16 at 21:14

0 Answers0