1

Using the train function, I am trying to create two sub-datasets. My original dataset has 2215 observations (UCI_CC_cleaned). Using the code below, I could create the training dataset (UCI.train) with 575 observations, but the test dataset does not have the correct dimensions. I am expecting 2215-575 observations, but it contains 2214 observations.

   train <- UCI_CC_cleaned$random >0.75
          UCI.train <- UCI_CC_cleaned[train, ]
  UCI.test <- UCI_CC_cleaned[-train, ]
Nader Mehri
  • 514
  • 1
  • 5
  • 21

1 Answers1

1

You have a logical vector in train, try to use ! to reverse the sign.

train <- UCI_CC_cleaned$random > 0.75
UCI.train <- UCI_CC_cleaned[train, ]
UCI.test <- UCI_CC_cleaned[!train, ]

You can use which to use - :

train <- which(UCI_CC_cleaned$random >0.75)
UCI.train <- UCI_CC_cleaned[train, ]
UCI.test <- UCI_CC_cleaned[-train, ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213